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ABSTRACT 

We explore the effects of size of the box that we use for simulations of the intergalactic 
medium (IGM) at redshift two. We examine six simulations from the hydrodynamic 
code ENZO using the same cosmological and astrophysical input parameters and cell 
size, but different box size. We study the CDM distribution and many statistics of the 
Lya forest absorption from the IGM. Larger boxes have fewer pixels with significant 
absorption (flux < 0.96), more pixels in longer stretches with little or no absorption, 
and they have wider Lya lines. The larger boxes differ only because they include 
power from long wavelength modes that do not fit inside the periodic conditions of 
the smaller boxes. The long modes change the density, velocity and temperature fields 
and these increase in the gas temperature. Small simulations are too cold compared 
to larger ones. When wc deliberately increase the heat we put into the IGM, we can 
approximate the Lya forest in a simulation of twice the size. When wc double the 
box size, the difference of most statistics from their value in our largest 76.8 Mpc box 
is reduced by approximately a factor of two. Most of the statistics converge towards 
their value in the simulation with the largest box size, though line widths are not 
yet converged and the most common value of the CDM density shows no sign of 
converging, because the larger boxes include places with ever higher densities. These 
regions arc not in the IGM, but they may produce the strongest of Lya lines. 

When we double the box size from 38.4 Mpc to 76.8 Mpc, the mean Lya absorp- 
tion decreases 0.5%, the frequency with which we encounter different common CDM 
densities changes by 2%, typical Lya line widths, the frequency of flux values and the 
power spectrum of the flux all change by 4-7%, and the column density distribution 
changes by up to 15%. When we compare to the errors in data, wc find that our 
76.8 Mpc box is larger than we need for the mean flux, barely large enough for the 
column density distribution and the power spectrum of the flux, and too small for 
the line widths that increase by 1 km s^^ when we increase the box from 38.4 Mpc 
to 76.8 Mpc, which is approximately the error in data. We can most readily see the 
effects of the long wavelength modes in measurement on the smallest scales in the Lya 
forest, the line widths, because they are easier to measure than the long wavelength 
power. Our optically thin simulations have a factor of several too few lines with H I 
column densities > 10^^ cm~^. Reducing the cell size from 75 to 18.75 kpc is not a 
solution. Our simulated spectra have 20% less power than data on small scales and 
50% less on large scales, and their Lya lines are 2.6 km s~^ too wide. We do not see 
how our simulations might match all data at z = 2. Reducing the cell size to 18.75 kpc 
lowers the Lya line widths by 1.8 km s~^, but radiative transfer effects can increase 
them by as much as 1.3 km s~^ at z=2.5. We might reduce line widths using a softer 
ionizing spectrum to reduce heating, or we could use (Ts > 0.9 that has the additional 
benefit of increasing the large scale power. It is hard to see how simulations using 
popular cosmological and astrophysical parameters can match the Lya forest data at 
z = 2. 

Key words: quasars: absorption lines - cosmology: observations - intergalactic 
medium ~ numerical simulations. 
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1 INTRODUCTION 

We are exploring the physical conditions in the IGM and 
the history stored in those conditions. We retrieve the phys- 
ical conditions by finding numerical simulations that give 
simulated spectra that have H I Lya forest absorption that 
are statistically similar to real spectra. Data on the IGM 
can give relative errors of the order of a few pe r cent f or sta- 
tistical properties of the forest. In Ijena et al.l (|2005l ) (J05) 
we showed that at redshift 1.95 our simulations using typi- 
cal cosmological and astrophysical parameters gave a good 
match to both the mean flux transmitted in the Lya forest 
and the &- value distribution that we use to describes the Lya 
line widths. We noted that the power spectrum of the flux 
for these simulations was broadly similar to that of data. 
However, we spent little time with the power, and we did 
not attempt to run any simulations that gave exactly the 
mean flux and fo-values of data. 

Here we explore one aspect of the accuracy of our sim- 
ulations, the dependence on size of the simulation box. We 
discuss various statistics, including the mean flux, the flux 
probability distribution function (pdf), the typical fe- value, 
the pdf of the &-values, the power and autocorrelation of the 
flux and the density and power of the CDM. We restrict 
our attention to one cosmological model at one epoch, and 
we say little or nothing about other relevant factors such as 
redshift evolution, the tilt of the initial power spectrum of 
fluctuations and a measure of the amplitude of the density 
fluctuations today, ag. We also do not discuss other impor- 
tant aspects of the simulations, including the accuracy of 
the initial conditions, the redshi ft where the s imulations be- 
gin pcitmann et al. .2006; Lu kic et al.ll2007h the accuracy 
of th e potential and hydrodynamical evolution (|Regan et al.l 
I2OO7I ). the ionization and heating and the lack of radiative 
transfer. In J50 we showed how cosmological and astrophys- 
ical parameters, and the box and cell size change the mean 
flux and 6- values. Here we cover many more statistics of the 
Lya forest in a more quantitative manner. 

It is now well known ("Kauffma nn fc MelottI [l99^ : [PenI 
ll997l : lBarkana fc Loeb| [2004: Sirko 2007h. that we need box 
sizes of many hundreds of Mpc to measure the power of 
the matter accurately. For CDM (gravity) alone we can now 
run simulations that are large enou gh to capture mos t of the 
variations from large scale modes (|Neto et al.l 120071 ) . How- 
ever, we can not yet run large enough hyd rodynamic simula- 
tions with the ~ 20 kpc cell size required (|Meiksin fc Whit3 
I2OO4I ') in the IGM, although we could instead run an en- 
semble of simulations, ea c h with a different mean density 
l|Mandelbaum et al.llioosi '). iMeiksin fc White! (|2004 l. for ex- 
ample, study the convergence properties of the flux power 
spectrum and autocorrelation function and recommend a 
box size of 25 Mpc but only for high redshifts {z > 3) 
and even then t hey do not find con vergence to better than 
10%. Similarly, (|Bagla fc RavlbOOSl ) study the effects of box 
size on halo mass functions and indicate that a minimum 
box size of several 100 h^^ Mpc is needed. 

A secondary goal of this work is to make it easier to 
obtain validated and reproducible results on the Lya forest, 
in accord with the sentiments of the "Cosmic Code Compar- 
ison Project" (|Heitmann et al.ll2007h . Hence we deliberately 
include many tables and figures to aid comparisons with 
other simulations. 



In §2 below, we briefly describe the simulation code and 
parameters we have adopted. In §3 we describe the statistics 
of the cold dark matter distribution. §4 describes the statis- 
tics of the flux in the Lya forest including the mean flux, 
flux distribution, and line 6-values and column densities. In 
§5 we give the power of the flux spectra and the autocorre- 
lation. In §6 we give the velocity field, baryon temperature 
and density. In §7 we show how putting more heat into a 
simulation makes its Lya forest appear like a simulation of 
twice the box length. In §8 we discuss an ambiguity present 
in the way flux power is calculated. In §9 we discuss how cell 
size, or resolution changes the Lya forest statistics. In §10 
we show how the different statistics converge on the values 
in large boxes and we compare to data. In §11 we review the 
physical causes of the changes we saw with box size. The ap- 
pendices contain technical details: A how we make spectra, 
B how we evolve them, C how we make extended sight lines, 
and D the lack of realistic variations in the density field. 

Overall, the comparison with data shows some large dif- 
ferences that make it hard to see how simulations will be able 
to exactly match the current Lya forest data at z = 2 using 
the popular cosmological and astrophysical parameters. 



2 ENZO IGM SIMULATIONS 



The numerical simulations l|Bodenheimer et al.l 1200 7l ) 
that we describe in this paper use t he Eulerian 
hydr od ynamic cosmolog i cal c o de ENZO |Brvan et al.l 
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Norman et al 


|2007^. The simulations contain both CDM 



and baryons in the form of gas, but no stars. The simula- 
tions were all run with the same cosmological parameters: a 
flat geometry fltotai ~ 1, comprising a vacuum energy den- 
sity of Ha = 0.73, flm = 0.27 (CDM plus baryons), a baryon 
density of fib = 0.044, a Hubble constant of Ho = 71 km 
s^^ Mpc^^ and an initial power spectrum scalar slope of 
Us = 1.0 with a current amplitude of as— 0.9. 

The ENZO code follows the evolution of the gas us- 
ing non-equil ibrium chemistry and cooling for hydro gen and 
helium ions (|Abel et al.lll997l : lAnninos et aP 119971 ). After 
reionization at 2 = 6, ph otoionization is provided using the 
iHaardt fc Madaul (|200lh volume average UV background 
(UVB) from an evolving population of galaxies and QSOs. 
This gives 1.330 x 10^^^ photoionizations per second per 
H I atom at z = 2 and 1.041 x 10^^^ photoionizations per 
second at z = 3. The simulations are optically thin so that 
all cells experience the same UV intensity at a given time. 
We do not treat the transfer of radiation inside the volume, 
and we include no feedback from individual stars or QSOs 
except for that implied by the uniform UVB. 

As in J05, we use two parameters to describe the inten- 
sity of the UVB. The parameter 7912 is the rate of ionization 
per H I atom in units of the Haardt fc Madau model dis- 
cussed above, while X228 measures the heat input per He II 
ionization, again in units of the rate for the Haardt & Madau 
spectrum. 

W e initiate the simulations using an lEisenstein fc hJ 
1 19991 ) power spectrum for the dark matter perturbations, 
that we insert at z = 99. The simulated volumes are all 
cubes with strictly periodic boundary conditions. Hence the 
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power is input at a finite number of discrete wavenumbers. 
When we increase the box size, we insert the new modes that 
now fit inside the box, but we do not change the amplitudes 
of the smaller modes. 

The amplitude of the power that we insert varies 
smoothly with wavenumber. We insert the amplitude ex- 
pected for the universe as a whole, with no random varia- 
tions associated with the finite box sizes. Since all our simu- 
lations use the same cosmological parameters, they all have 
exactly the same initial power for all modes that fit inside 
their box. The power in the simulations is not adjusted to 
include the variations in mean density that we see in the uni- 
verse on the scale of the boxes. Hence, all the simulations 
are more similar to the mean of the universe than would 
be any observational measurement. We discuss this more in 
Appendix D. In this limited sense, the boxes contain infor- 
mation on scales much larger than their sizes. 

We initiated all simulations using the same random 
number seed to generate the phases of all the modes. The 
phases are assigned to modes according to the mode direc- 
tion and size measured in units of the box size (and not 
Mpc). Hence, in box units, the simulations have the same 
distribution of matter on the largest scales, as we show be- 
low. 

We ran the simulations to z = 2, and all the results that 
we give refer to z = 2, except for specific cases discussed in 
Appendix [B] 

2.1 Series of Simulations 

We will discuss three series of simulations with parameters 
listed in Table [1] The main A series have identical input 
parameters except for the box size. The larger boxes con- 
tain more total volume and mass and they contain power on 
scales that does not fit in the smaller boxes. 

The A and KP series simulations have identical cosmo- 
logical parameters and exactly the same comoving cell size of 
53.25/i~^ or 75 kpc comoving. Each simulation has one CDM 
particle for each cell initially, and each dark matter particle, 
in each simulation, has a mass oiMcDM ^ 9.5x10'' h'^MQ. 
All the simulations are fully constrained by the input param- 
eters since we do not re-scale any of the simulations outputs, 
such as the densities, H I, or fluxes. 

Each A and KP series simulation is a cube with side 
length N X 75 kpc. The A series simulations differ in size 
by factors of two, from the largest simulation A with A*''^ = 
1024^ ceUs to the smaUest A7 with = 32^. The A sim- 
ulation has comoving box side length of 54.528h~^ or 76.8 
Mpc , while the A7 has sides of 1.704h"^ or 2.4 Mpc. The 
box for simulation A is 32 times larger in each dimension 
than A7, giving it a volume 2^^ — 32, 768 times larger. We 
also ran a simulation with 16^ cells but found it significantly 
different and of no value to us. 

We discussed simulations A, A2, A3 and A4 in J05 for 
other purposes. We do not use the label A5 because this was 
a version of A4 described by JOS. In JOS we noted a problem 
with A2. We have now re-run this simulation and we found 
that the problem was incorrect joining of the sub-grids of 
A2 presented in JOS. The results presented for A2 in JOS 
were incorrect for the power spectrum, but correct for the 
ffux and b- values. 

In the next few sections we discuss the A series simula- 



Table 1. Parameters input to specify the simulations. Box and 
Coll size are comoving distances. Simulation A4 is in both the A 
series that explores box size and the B series that explores the 
cell size. The KP series are variants on the A series with different 
UVB intensity (7912) and heating per He II ionization (X228)- 



Simulation 


N 


Box 


Cell 


7912 


^228 


or box 


(cells) 


size 


size 






name 




(Mpc) 


(kpc) 






A 


1024 


76.8 


75 


1.0 


1.8 


A2 


512 


38.4 


75 


1.0 


1.8 


A3 


256 


19.2 


75 


1.0 


1.8 


A4 


128 


9.6 


75 


1.0 


1.8 


A6 


64 


4.8 


75 


1.0 


1.8 


A7 


32 


2.4 


75 


1.0 


1.8 


A2kp 


512 


38.4 


75 


0.9217 


2.165 


A3kp 


256 


19.2 


75 


0.836 


2.579 


A4kp 


128 


9.6 


75 


0.738 


3.045 


B2 


512 


9.6 


18.75 


1.0 


1.8 


B 


256 


9.6 


37.5 


1.0 


1.8 


A4 


128 


9.6 


75 


1.0 


1.8 


B3 


64 


9.6 


150 


1.0 


1.8 



tions alone. We discuss the three simulations, A2kp, A3kp 
and A4kp, the KP series, in fjT] They explore the effect of 
changing the heat input per He II ionization. We discuss 
the 4 simulations in the B series, which include A4, in 35] 
when we discuss changing the resolution of the simulations 
by changing the cell size. 



3 THE CDM DENSITY DISTRIBUTION 

We discuss how the CDM density distribution varies with 
box size in the A series of simulations at 2 = 2. We do not 
discuss the baryons until §4. 

In Figure [T] we show the frequency distribution of the 
CDM in the cells. We define the normalized-density of CDM 
as 

ScDM = PCDM /PCDM (1) 

where the denominator is the mean density of CDM in the 
universe at z = 2 which is also the mean density of the 
CDM in all our simulations. The s hape of the curve is ex- 
pected from semi-analytical work (|Lacev fc Cole|[l994l) on 
the formation of dark matter halos. 

The distributions of the densities for all simulations con- 
tinue towards much lower densities than we show. The dis- 
tributions become steeper with decreasing density, but oth- 
erwise we see no conspicuous features. Since the simulations 
initially have an average of one CDM particle per cell, most 
cells with 5cDM< 1 contain no particles. Their densities can 
be non-zero because density is defined by distributing mass 
with immediately neighboring cells whenever a particle is 
not exactly centered in its cell. Hence a particle at the cor- 
ner of a cube would contribute Scdm~ 0.125 to each of the 
eight cells. 

In Table [2] we list the percentage of the cells with zero 
density. This is 13.78% in simulation A, increasing system- 
atically to 14.72% in A7. These cells have no immediate 
neighbors containing a CDM particle and we assigned them 
a nominal density of 10~^^ particles per cell, which we can 
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Figure 1. The frequency distribution of density of CDM, &CDM 
per cell. We plot the fraction of the cells (or volume) in a given 
simulation that have CDM density, in units of the cosmological 
mean density, in bins of width log &CDM= O-l- The A simulation 
is shown with a solid (black) line and the other simulations arc 
shown using a symbol in each bin, connected by straight line 
segments, as follows: A2 (violet pluses), A3 (blue stars), A4 (green 
squares), A6 (orange diamonds) and A7 (red triangles), where the 
larger simulations extend farther to the right. 



ignore. We also list the minimum and maximum density in 
any cell. 

The lowest density portions of the distribution are not 
physically realistic because we have too few particles per cell 
to accurately simulate densities far below the mean. We are 
more interested in the gravitational potential than in the 
density in a cell, and the potential is much smoother than 
the density distribution. The dark matter is smoothed twice, 
once when CDM is assigne d to cells using the piecewis e lin- 
ear cloud-in-cell algorithm (|Hocknev fc Eastwood|[T988l ) and 
again when the potential is calculated. Where we have low 
dark matter densities, we may have low-level fluctuations 
in the potential due to particle discreteness. At worst, the 
CDM particles become mildly collisional. 

The simulations in larger boxes contain higher densities, 
and hence they extend further to the right in Fig. [T] How- 
ever, on the scale of this plot, all the simulations contain 
approximately the same frequencies at each density, with 
no variation with box size, which suggests that we will see 
little effect of box size on the statistics of the flux beyond 
those coming from the larger volumes and higher maximum 
densities in larger boxes. We do, however, see one minor 
difference between the boxes. Several simulations have fre- 
quencies lower by about a factor of two for the densities 
within a factor of a few of the largest value found in that 
simulation. This is conspicuous for A7 (red), A6 (orange) 
and A3 (blue), but not for A4 (green) and A2 (violet) which 
seem to follow A (black). 

It is striking that we see almost no change with box 
size in the probability distribution function (pdf) of the 



Table 2. Statistics of the distribution of the density of CDM 
in the A-scries simulations. Zeros are the percentage of cells in 
simulation with zero density. NonL gives the percentage of cells 
in simulation with 5cDM> 3. 



Name 


N 


Zeros 


Min 




NonL 


Max 




(cells) 


(%) 


ScDM 




(%) 


ScDM 


A 


1024 


13.78 


3.68-10- 


11 


4.67 


1.64-10"' 


A2 


512 


14.07 


2.32-10" 


10 


4.68 


1.41-10'* 


A3 


256 


14.37 


4.86-10- 


-9 


4.71 


5.71-10^ 


A4 


128 


14.44 


3.92-10" 


-8 


4.79 


3.12-10^ 


A6 


64 


14.67 


3.50-10" 


-8 


4.85 


8.25-10^ 


A7 


32 


14.72 


2.09-10" 


-7 


4.91 


3.01-102 



CDM density per cell for the overdensities responsible for 
the Lyof forest absorption, approximately 0.5 ^ Scdm ^ 
16. We obtain the typ ical bar y on de nsity corresponding to 
a given log Nhi from ISchavd (|200ll . Eqn. 10) who devel- 
oped an analytic model that gives an excellent fit to sim- 
ulations. Using the cosmology for the A series {^Im— 0.27, 
Qb~ 0.044, H=71 km/s/Mpc) and the temperature-density 
relation from simulation A2 (T = 12910 K{pb/pb)°-^, J05 
Table 9) we find at z = 2 

Nhi :^ 7.8 xlO^^pt/pb)^-''^ (2) 

Lines in the Lya forest with log Nhi= 12.5 - 14.5 cm"^ 
then come typically from baryon overdensities of pb/pb = 
0.5 - 15.7. A larger range of densities is involved in makin g 
significant absorption in the Lya forest jSchave et al.|[2003l ). 
For this discussion we assume that the bary on and CDM 
density fiuctuations are similar in amplitude. ICnedin et al.l 
(|2003l ) show that the baryon fiuctuations are similar to the 
CDM fluctuations on scales larger than the filtering scale, 
with 5b/5cDM = exp{ — k^/k%), where the filtering scale kp 
depends on the integral over time of the Jeans length and is 
approximately 0.055 s/km (11 Mpc^^) aX z — 2 (their Fig. 
2)- 

The statistics on the distribution of the density of CDM 
in Table [2] show that both the maximum and minimum den- 
sity of CDM in any cell increases systematically with the 
box size. The minimum is not relevant to us; since we work 
with the density and not the log(density), these values are 
all essentially zero, but the maxima are important for the 
flux spectra. 

In Figure [2] we zoom in on the densities that are more 
important for the flux in the Ly« forest, and we expand the 
sensitivity by dividing by the frequencies found in simulation 
A. We now see systematic trends with box size. The larger 
boxes have higher frequencies of small densities, approxi- 
mately log(5cDJ\/) < —0.3, and lower frequencies of higher 
densities. All boxes have approximately the same frequency 
for \og{5cDM) — —0.3, near the most common density. At 
densities below the most common, the largest difference from 
the A simulation is seen at lower densities in the smaller 
boxes. Above the most common densities, the largest differ- 
ences from A are seen at near the mean density. The larger 
the box, the less the deviation from A. We anticipate that 
these differences will manifest as changes in the baryon den- 
sity and hence the Ly« forest. 

In Table [3] we list two further statistics showing the 
changes in the CDM density per cell with box size, the 
RMS and the mean of the absolute difference (MAD), 
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Table 4. Statistics of the distribution sj amongst sight lines 
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Figure 2. As Fig.[T]but showing the frequency of the densities of 
CDM in units of the frequency in simulation A. The points from 
the larger boxes are higher at log ScDM= 1- 



Table 3. Statistics of the Distribution of the Density of CDM 
Relative to Simulation A 



Name 


MAD 


RMS 




(%) 


(%) 


A 


0.00 


0.00 


A2 


0.57 


0.70 


A3 


1.23 


1.44 


A4 


3.01 


3.32 


A6 


7.11 


8.23 


A7 


14.30 


17.79 



each relative to the value in to box A and averaged over 
—2 < \og{5cDM) < 2. Both statistics decrease by about 
a factor of two for each doubling in box size, reaching an 
RMS of only 0.7% for simulation A2. We will see a simi- 
lar rate of convergence for other sta tistics of the Lya forest 
(|McDonald fc Miralda-EscudellioOll ). 



3.1 Distribution of the variance of the density of 
CDM amongst sight lines 

We will be examining the power of the CDM in the next sec- 
tion because this helps us understand how the simulations 
in general and particularly the power of the flux change with 
box size. Here we look first at the variance of the CDM be- 
cause this is related to the sum of the power over all modes. 

In Figure |3] we show the xy faces of the series A simu- 
lations. For each position in the xy plane, we show 



(1/7V) ^(5CDM 



-1) 



(3) 



the mean value of [Scdm — 1)^ along the cells parallel 
to the z axis which goes into the page. We consider these 
rows as sight lines, which we label with the subscript to 



Name 


Min 


Mode 


Mean 


Max 


A 


0.61 


3.55 


175.78 


1.14-10'' 


A2 


0.58 


1.78 


146.12 


8.39-105 


A3 


0.50 


0.71 


122.84 


2.49-10^ 


A4 


0.43 


0.71 


100.66 


l.ll-lO^ 


A6 


0.36 


0.71 


53.38 


2.24-10"' 


A7 


0.29 


0.71 


26.61 


3.42-10=* 



indicate a choice of both x and y. We show this quantity 
because Nsf is the contribution of that sight line to the 
variance of Scdm in the whole simulation box, since 

N^Var{ScDM) = ^ (fcoA/ - 5cdm)^ 



= [Scdm - 1)' = XI 



(4) 



where is the number of cells in the simulation box, and 
Scdm = 1 if and only if it is the mean of the Scdm values 
of all cells in the box, following the definition of Scdm in 
Eqn.[T] 

The quantity sf tells us how much that sight line con- 
tributes to the mean power of all sight lines. The quantity 
sf is not the variance along each sight line, erf, since that 
is the mean of [Scdm — Scdm.z)^, where the Scdm,z is the 
mean along each sight line. These means differ from sight 
fine to sight line, and can be very different from unity. 

Figure [3] looks similar to the projection of the density, 
since the sf is largest where we encounter a cell with a high 
density. If we shrink the squares from the smaller simulations 
to give constant Mpc per mm on the page, then the density 
and size of structures looks approximately the same in all 
simulations, although they are not the same, for example 
because the smaller boxes are also smaller in the z direction. 

In Figure[3]we show the distribution of the sf and in Ta- 
ble |4] we give some statistics. We have one sf for each sight 
line parallel to the z axis of each simulation box. Larger 
boxes have a lower frequency of sight lines with small sf, 
their most common (mode) sf is larger, they have a higher 
frequency of larger sf, and larger maximum sf . This is be- 
cause the larger boxes have more sight lines, each of which 
is longer, and there are higher densities in the larger boxes. 

The small boxes lack the high density peaks of the larger 
boxes because they lack volume, and they lack long modes. 
They do not contain enough particles to produce the highest 
densities. To make a peak with 10^ particles in a cell, we 
must collect particles from 10® cells, more than are contained 
in t he A7 simulat i on. 

iBagla fc Ravi (|2005l ) have explored how the frequency of 
high density CDM collapsed structures changes with effec- 
tive box size. They use simulations with A'^ = 256 CDM 
particles in 300/i~'^Mpc boxes with a softening length of 
0.47/1"^^ Mpc. They find that the number of high density 
peaks decreases when they truncate the initial power spec- 
tra at lengths less than the full box size. They see a factor 
of three fewer collapsed structures with mass IO^'^Mq when 
they truncate the power at 1/4 of the box size. However, 
when they truncate at 1 /2 the box size they see only an 80% 
reduction in the number, showing convergence to the result 
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Figure 3. The contribution to the total variance of the density of CDM from each sight hne. The sight lines are parallel to the z axis 
that goes into the page. Simulation are, from upper left to lower right, A, A2, A3, A4, A6 and A7. Pixels that have variance below the 
mean are shown as white. Darker pixels have larger log s'f. 



or • • • \ • • • \ • • • \ • • • 1 0.30 




Figure 4. Distribution of the s^. The is measured along sight lines parallel to the z axis and of length equal to the size of each box. 
The vertical axis shows the fraction of sight lines (pixels in the xy plane) with variance in bins of log s'f = 0.1. The vertical scale is log 
(fraction) on the left panel and linear fraction on the right. The larger boxes have larger modes (right panel), and extend to larger 
values (left panel). Both A and A2 have approximately the same maximum sf. 
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expected for a much larger box. This convergence happens 
for boxes 3-6 times larger than A. 



3.2 Power of Normalized CDM Density 

We compute the Fourier transform D{k) of Scdm — I, the 
normalized-density of CDM minus one, 



D{k) = 



1 



(A M/2 / iScDMiu)-l)e ^'"'du 



(5) 



where k is the wavenumber, i is the pixel index, u is veloc- 
ity and Alt = cA2/(l + z) is the velocity width of a pixel 
with redshift width Az. Subtracting one has no effect on 
D{k) except for the mode with zero frequency. We take the 
transform of the density along each sight line parallel to and 
extending the full length of the z axis. We did not explore 
the X and y directions. We use a discrete Fast Fourier Trans- 
formation algorithm, and we use 



P{k) =< D{k)D*{k) > (km s~^). 



(6) 



as our estimate for the one-dimensional power, where the 
brackets refer to averaging over all sight lines parallel to the 
z axis. 

Since the density distributions in the box is always 
strictly periodic, the power spectrum (of the signal along 
sight lines parallel to the z axis) is non-zero for the dis- 
crete set of modes k — 27rs/L„, where s=l,2,3... A^, and 
L„ = LH{z)/{l + z) is the length of each spectrum in ve- 
locity units, corresponding to L comoving Mpc. At redshift 
z = 2, H = 201.069 km s~^Mpc~\ and simulation A has 
L„ = 5147.37 km s"\ 

In Figure[5]we show the power spectrum of Scdm — I for 
the A series boxes. We begin each spectrum at the left at its 
fundamental mode, the smallest wavenumber ki,ox ~ 2n/Lv 
for that box. For simulation A kbox = 2-k/Lv = 1.22066 x 
10"'' s/km. We end the plots on the right at the Nyquist 
frequency, k^eii = 2-K/{2Lceu) = 0.682 s/km, or log kceii = 
—0.20 s/km, where Lceii is the velocity width of a simulation 
cell, 5.027 km s"^ at z = 2. 

The power is larger at all k in the larger boxes, which 
we expect because the variance like quantity sf in Fig. |4] 
is also larger. The increase in power with box size is most 
pronounced on the largest scales. The power is larger in 
larger boxes even on the smallest scales. We now present a 
figure that shows that the increase in the power with box size 
is intrinsic to the density distribution, and not an artifact 
of the length of the sight lines or the number of sight lines 
through the boxes. 

In Figure |6] we show power spectra obtained from sim- 
ulation A6 using a reduced number of shorter sight lines. 
We divide the A6 cube into the eight sub-cubes, each of size 
A7, that together exactly fill A6. We made spectra from 
all the sight lines restricted to each sub-cube, and we took 
the power spectrum of each. We then distribute these power 
spectra randomly into the eight means, each of which con- 
tains some sight lines for each of the eight sub-cubes, and 
the same total number of sight lines as does the mean power 
spectrum of A7. We see that the power in the sub-cubes is 
distributed about that in A6, and not A7. This shows that 




og k (s/km) 

Figure 5. The ID power spectra of (5c-DA/-li taken along all 
ID line of sight lines parallel to the z axis of each A series box. 
The vertical lines show the largest modes for each box, those with 
kbox = 27r/L„. The power for the larger box are larger and extend 
farther to the left (large distances). 



the extra power in A6 is intrinsic to the density distribution 
in A6, and not from the length or number of sight lines. The 
dispersion of the power in the 8 spectra gives one indication 
of the random error in the power of the A7 simulation. 



3.3 Power from Density Peaks 

Parseval's Theorem states that sum of the power at all 
modes is proportional to the sum of the square of the signal. 
Using the signal from Eqn. |4l we have 



-y 



' ^ 2 



(k)dk (7) 



where is the number of cells in a box, and P(k) has 
velocity units that cancel the inverse velocity units on the 
kbox- This can also be written as 



1 



(8) 



which shows that the mean value of S; averaged over all 
cells in the box, is equal to the sum of all modes in the 
power (which is also averaged over all sight lines), divided 
by the length of each spectrum . This is the normalization 
used bv lMcDonald et alj (l2006t) (Eqn. 1 2). It is the "System 
2" normalization from BraceweUl (|l986l ) (p. 7). 



The S; values show how the sum of the power at all 
modes is distributed amongst the sight lines. We saw in Fig. 
|4] and Table |4] that a few sight lines have sf values vastly 
larger than the mean. This means that these few sight lines 
also dominate the total power of each simulation. 

In Figure[7]we show a single 1024 pixel line of sight from 
simulation A with a density peak of 5c dm— 3 x 10^. This 
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Figure 6. The ID power spectra of (Scum-I for sight hnes of 
length 64 cells drawn from the A6 (128"^) simulation. The solid 
(black) line extending farthest to the left is the power for all sight 
lines through A6. The eight short dashed lines show the power 
for sub-samples, each with 64^ sight lines of length 64 cells. The 
lowest lines on the left (short solid red) is the power for the A7 
box, which also comprises 64-^ sight lines each of length 64 cells. 



peak increases the power about 1000 times over a wide range 
of wavenumbers. The density spike is narrow i n velocity 
space, and hence wide in k space (e.g. Fig. 6.2 of iBracewelll 
(|l986i )). In the lower right panel we see that smoothing the 
density spike removes most of the excess power. 

We can estimate the amount of power added quanti- 
tatively. A typical sight line from simulation A has of 
order 3 (Table |4]| . One pixel with a normalized-density of 
1700 increases the variance to (3 x 1023 + 1700^)/1024 = 
2800, approximately the increase we see in the integrated 
power. Hence, a single sight line with a normalized-density 
5cDM~ 10^ will contain as much power as the whole cube of 
10^ sight lines each with typical variance. Simulation A con- 
tains 77 cells with Scdm> 10^ and 12 with Scdm> 
sufficient that these few cells, and the sight lines that pass 
through them, will dominate the power spectrum of the (un- 
smoothed) CDM. 

In Figure [8] we show how the sum of the of the sight 
lines increase with the number of sight lines included. We 
start on the left with the sight lines with the smallest sf, end- 
ing on the right with those with the largest. We see that, 
for all simulations, the few sight lines with the largest sf 
completely dominate the total. Depending on the simula- 
tion, 90% of the total comes from only 2 to 10% of the 
sight lines. The total sf, and hence the total power, is an 
unstable quantity, which can change significantly as the few 
highest density peaks changes in number and density. 

We have shown that the power of the CDM density is 
larger in larger boxes primarily because larger boxes contain 
rare regions with higher density. We also see higher power 
because we use longer sight lines in the larger boxes. As 
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Figure 8. The cumulative sum of the contribution of the sight 
lines through a simulation to the total power of all sight lines. 
The vertical axis is the cumulative sum of . The horizontal axis 
shows the rank of the sight line ordered in increasing s^, which 
is its contribution to the total variance. We express the rank as a 
fraction of the total number of sight lines, to simplify comparison 
of the simulations. The vertical axis is the sum of the in all 
sight lines with smaller than indicated on the horizontal axis, 
divided by the total s^. From top to bottom at a fraction of 0.8 
the boxes are A (black), A7 (red), A2 (violet), A6 (orange), A3 
(blue) and A4 (green). 

iBagla fc Ravl(|2005l ) showed, there is some effect from having 
longer modes in the larger boxes. However, Figs. [TJ [2] showed 
only small changes in the frequencies of different densities 
per cell, which suggest that the extra long modes have a 
small effect on the quantities that we are evaluating. 

When we examine the flux transmitted through the 
IGM we are most interested in densities near the mean. 
Although the larger boxes have higher maximum densities, 
they have slightly lower portions of their volume above a 
moderate density. In Table [2] the column NonL gives the 
fraction of cells with CDM density exceeding 3 times the 
mean. This fraction decreases systematically with box size, 
from 4.91% for A7 to 4.67% for A. 



4 STATISTICS OF THE FLUX IN THE Lya 
FOREST 

We m ake flux spectra using code described in IZhang et al.l 
1 19971 ) and JOS. We make each spectrum along a row of cells 
parallel to the z axis of the boxes, just as we did for the sf 
values that describe the variance of 5c dm in ^3.11 We use a 
number of pixels equal to the box side length N . 

In Figure |9] we show spectra of the flux along some ran- 
dom, unrelated sight lines through the A series simulations. 
We show equal total velocity length for all simulations, and 
hence for A7 we show 32 disjoint spectra, each separated by 
a vertical dotted line. We should ignore the vertical discon- 
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Figure 7. The normalized-density of CDM 5c DM along a single sight line (208) in simulation A as a function of velocity in the full 
length through the box (top left). On the top right we show the ID power spectrum of this normalized-density. In the lower left panel 
we have removed the density peak by smoothing the CDM between 2200 and 2250 km s^^ with a box car average of 20 pixels or 100 
km/s. The corresponding power is on the lower right. We show the log of the normalized-density to show both the typical variations and 
the peak. We take the power of the linear, not the log normalized-density. The vertical scale on the two lower panels is different from on 
the corresponding upper panels. 



tinuities where spectra end inside an absorption line. These 
tend to make lines look narrower, especially in the smaller 
boxes where there are more discontinuities. We see two ma- 
jor trends with box size. 

The larger boxes contain large velocity intervals with 
very little absorption. These stretch over many hundreds of 
km/s, longer than the size of the smaller boxes. The larger 
the box, the longer the regions with little absorption. These 
low absorption regions are clearly showing correlation in the 
density on large scales. We would not expect to see them as 
often if we truncated the power to include only short modes. 

The smaller boxes seem to have more absorption in to- 
tal. The values that we list in the figure caption show this is 
correct for the A6 and A7, but the larger simulations have 



nearly identical mean flux. The smaller simulations also have 
a higher proportion of pixels with flux within a few percent 
of the continuum, and they have fewer lines of depth 5 - 
50%. The number of the deepest lines seems approximately 
constant. 



4.1 Statistics of Flux Spectra 

In Table [S] we list statistics of the flux for the spectra from 
the simulations. The statistics are from each pixel in all A''^ 
spectra through each box in the z direction. The A values 
are the mean flux that we should add onto the current value 
to obtain that in the next larger box. The overall trend of 
the mean flux F with box size is hard to discern because 
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Figure 9. Spectra of the flux from sight hncs drawn from the A 
series simulations, with the largest, A at the top, A2 next through 
A7 the smallest at the bottom. All are shown on the same velocity 
scale to aid comparison of lino widths. To fill the length of simu- 
lation A, we show 2 disjoint, unrelated spectra for simulation A2, 
separated by the vertical line in the middle. For simulation A7 
we show 32 randomly chosen spectra from this 32'' box. When the 
end of a spectrum occurs in an absorption line we see a vertical 
discontinuity in the plot. Averaging along the spectra shown, the 
mean fluxes are 0.887, 0.907, 0.900, 0.891, 0.870 and 0.821 from the 
largest to the smallest box. 



Table 5. Statistics of the flux in all spectra from the simulations. 
We list the mean flux, the error on the mean, the change in the 
mean to obtain the flux in the next larger box and the normalized 
variance of the transmitted flux. These statistics all refer to the 
flux in each pixel in the box. The column Mode{F^) is different 
and refers to the mean flux per sight line, {F^), rather than per 
pixel. The Mode is the most common of the mean flux values in 
bins of 0.0005. 



Name F error A Var{F/F) Mode{Fi,) 



tinuous area because the spectra in adjacent sight hnes are 
highly correlated, and hence the error on the mean flux value 
is much larger than the standard deviation divided by the 
square root of the number of samples. We use a 4x4 tiling 
for A and A2, 3x3 for A3 and A4 and 2x2 for A6 and A7. 
These choices are a compromise between having enough tiles 
to give a small random error and having tiles large enough 
to reduce the inter-tile correlations. 

The error given by tiling captures some of the variation 
in the mean density on large scales across the boxes. However 
it is less than the external error that we would want to use 
when we compare to real spectra because it misses all of 
the variation that we would see if we started each box with 
different random phases, and we allowed each mode to have 
a random amplitude, and it misses the variation due to all 
modes larger than the box. On the other hand, this error 
from tiling is larger than the smallest change that we can 
consider indicative of a trend when we compare a series of 
boxes. This is because the statistics are evaluated from the 
whole of each box and all boxes use the same amplitudes 
and phases for their modes. 

In Table [5] we also list the variance of the flux F/F 
evaluated for all sight lines through the box, where F is 
the mean flux in the box, and not that in each sight line. 
We have Var(F/F) = A^-^ J2iiF/^) " l)^ ^here the sum 
is over all A'^^ pixels from all spectra though one side of 
the box. The variance of the flux in each pixel is then 
Var{F) = F'^ X Var{F/F). The Var{F/F) quantity de- 
creases systematically with increasing box size, because, we 
will now see, the fraction of pixels with flux < 0.97 is up to 
a factor of two smaller in the larger boxes. 

In Figure [10] we show the distribution of the flux per 
pixel for all spectra through each A series simulation, also 
called the flux pdf. In Figure [11] we show the same value, 
divided by the fractions for simulation A. The simulations all 
have approximately the same frequency of pixels with a flux 
of 0.96 " 0.97. The larger simulations have higher frequencies 
above 0.97, and lower below. Our impression that the spectra 
from the smallest boxes have more absorption is conflrmed; 
they do have a much larger fraction of pixels with 0.05 < 
Flux < 0.9. We also confirm that the larger simulations 
have more pixels with very high flux. We see smaller changes 
between the larger simulations, indicating convergence by 
the size of box A. If this trend continues, then the fractions 
for simulation A will be within approximately 5% of those 
for a much larger simulation. 

We see that the differences between the simulations de- 
crease to less than 10% for fluxes below 0.02; they all have 
the same fraction of their spectra occupied by the bottoms of 
saturated absorption lines. This is reasonable because Figs.[T] 
and [2] show they all have approximately the same frequency 
(per Mpc^) of high density regions. The largest differences 
between the boxes are for intermediate flux levels from the 
sides of saturated lines, or the bottoms of nearly saturated 
lines, both of which are a small fraction of the pixels. 

In Table[5]we also list Mode(_Fi:,) the mode of the mean 
flux values, Fl , with one mean per sight line. The modes are 
the most common mean fluxes when we use bins of 0.0005. In 
contrast with the mean flux per pixel, these modes per sight 
line show a systematic decrease with increasing box size. 
These modes are all much less than the mode of the flux in 
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the changes are small compared to the measurement errors 
(discussed below). The underlying trend is apparently for 
larger mean flux in larger boxes, but this seems to reverse 
for the largest 4 boxes, where the flux decreases in the larger 
boxes. 

We estimated the errors on the mean flux values in Ta- 
ble [5] from the standard deviation of the mean flux values 
for tiles across the xy face of each box. We use tiles of con- 
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Figure 10. The distribution of the flux per 5 km s^^ pixel in 
the spectra from the A series simulations. We use 200 bins for the 
flux, each of size 0.005. The distribution from the larger boxes are 
lower on the plot for most fluxes (0.2 < flux < 0.9). 



Figure 12. The distribution of the fe-values for the lines with 
12.5 < log Nhi < 14.5 (cm-2) and t > 0.05 from the A series 
simulations. We show the fraction of all lines that are in 2 km 
wide bins (centered at 20, 22, 24 km s~^ etc. ), or 6 km s~^ for 
A7. The corresponding best estimate values are shown in Table 
[6] The results from the larger boxes are lower at 6 = 25 km s~^, 
with the exception of the lowest curve which is from the smallest 
box. 





Figure 11. Distribution of the flux per pixel for A series. We 
show the fraction of pixels divided by those for simulation A. The 
results for larger boxes are nearer to the horizontal line at 1.0 for 
most flux values. 



individual pixels, which is 0.990 to 0.995 for all boxes, as 
seen in Fig. [10] 



4.2 Statistics of the Lines 

In this section we quantify the types of lines seen in the 
simulations. We ob tain line stat i stics b y fittin g Voigt pro- 
files a s described in IZhang et al. I (|l99'if ). As in lTvtler et al.l 
(|2004 ) and J05 (§5.1, 6.2) we consider only lines with 
12.5 < log Nhi < 14.5 cm~^. We also limit our discussion 
to lines with central optical depths r > 0.05, which is a new 
constraint for this paper. 



4-2.1 Line Widths: b-values 

In Figure [12] we show the distribution of the 6- values. The 
distributions show small but clearly systematic changes with 
box size. In detail, the larger boxes have wider lines, fewer 
narrow lines with 6 < 28 km s~^, and more lines with 
6 > 28 km s~^, and a slightly broader distribution. The 
smallest box A7 is an exception to this, presumably because 
the fundamental mode in this box is nonlinear at 2; = 2. The 
change in typical line width can come from a combination of 
three factors: larger absorbing regions, giving more Hubble 
fiow across a line; larger peculiar velocities from the increase 
in large scale power; and higher temperatures also from th e 
increase in velocities (|Theuns et al 1 19991 : iBrvan et al.lll999l ) . 

In Fig. [13] we see that the 6- value distribution is sen- 
sitive to the minimum line central optical depth r. We use 
a sample with 12.5 < log Nhi < 14.5 cm~^ and t > 0.05. 
If instead we use a sample with r > 10^''' we see a differ- 
ent fe— value distribution that has a larger fraction of lines 
with b < 27 km s~^ and a smaller fraction of lines with 
larger 6-values. Our sample is the subset of the total which 



12 D. Tytler et al. 




Table 6. Estimates of the ha parameter. The A values are the 
value of bo- in the row above minus the value in the current row, 
except for A2kp which we subtract from A. 



40 60 
b (km/s) 

Figure 13. The b-paramctcr distribution for the A3 box. The 
vertical axis is the ratio of the fraction of lines in two samples, 
the sample with r > 0.05 divided by the fraction for the sample 
with T > 10~^. We find all lines with log Nhi> 12.5 (cm'^) and 
plot the ratio of the fraction of lines which have r > 0.05 with the 
fraction of lines which have r > 10^'' as a function of the fe-valuc 
of the line. 
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B2 and B3 but we give the value of 0.8 km from A4 
because these boxes are all the same size and contain the 
same number of Lya lines of a given Nhi- 

The mode of the 6-value distribution is bp^ak = 
0.9457&O-. We do not list ha for A7 because this box is barely 
large enough to contain a single complete line, and we do not 
obtain fits to fe-values. The &-values show a simple trend: a 
systematic increase with increasing box size, consistent with 
the change in shape of the pdf of the 6-values. We discuss 
the convergence behavior of this statistic in §10. 



lacks broad shallow lines. The shape of the distribution in 
the figure comes from the simultaneous requirement that 
log Nhi> 12.5 (cm~^) and r > 0.05. Narrow lines with 
log Nhi> 12.5 always have r > 0.05 and hence they are 
not effected by the 0.05 limit. Very broad lines often have 
log Nhi > 12.5 and r < 0.05. 

In iTvtler et al.l J2004l . Fig. 18) we showed that 
iHui fc Rutledgel (|l999t ) function gave an excellent fit to the 
distribution of 6- values from a simulation B, which had 
35 kpc cell size, half of that used here. The Hui-Rutledge 
function has only one parameter, the that describes the 
typical line width. 

In Table |6] we give estimates for the values that best 
fit the distributions of the lines from each simulation. We 
use the maximum likelihood method since it treats the in- 
dividual 6-values, and not the binned values that we show 
in the plots. We make two improvement on JOS. First, we 
now fit only lines with 6 < 40 km s~^ because we want 6^ 
to describe the most common lines, and not the rare broad 
features that are hard to see in real spectra because of pho- 
ton noise, fiux calibration problems and uncertain continua. 
When we included all 6-values in JOS the 6^ was larger by 
about 0.5 km s~^. Second, we completely sample the boxes, 
where as JOS had few spectra, all of which began at the low- 
est density part of that box. The values that we give here 
differ from those that we gave in JOS for the same simu- 
lations for these reasons. We estimate the errors using the 
tiles, as we did for the for the mean flux, and again the 
same comment apply; external errors are larger, and we can 
be interested in differences between boxes that are less than 
the external errors. We did not estimate the errors for B, 



4.2.2 Column Density Distribution: f{N) 

In Figure [14] we show the distribution of the H I column 
densities of the lines, relative to the values for box A. Here 
the functioir f{N) is the differential distribution of lines, 
per (linear) cm~^, and per unit absorption distance X. The 
coordinate X{z) is defined such that the density of absorbers 
per unit X should be independent of X and z. The number 
density of non-evolv ing objects p er unit redshift along a line 
of sight is given by ()Tvtleilll98ll . Eqn. 3) 

N{z) = NoY^{H{z)/Hoy^ (9) 

where Y = [1 -\- z) and we define the function 

H{z)/Ho = E(z) 

^Setting X{z = 0) =0, X can then be defined (|Tvtleil 

Il982h using N(X) — N(z) dz/dX — constant, which gives 

X{z) = f Y'^E'^{z)dz 
Jo 

= / Y[Y{l + zQ.„,)-z{2 + z)nK]-^'^dz (10) 
Jo 

Until this decade it was common to use models with 
Q.A= and go = or 1/2. We now use the cosmological 
parameters that we gave in ^ f2A= 0.73 and S7m= 0.27, 
which at 2 = 2 give dX/dz = 3.17801. For qo = Q the 
value of dX/dz = 3 is similar, but for go = 1/2 we have the 



Box Size 13 



Table 7. The Column Density Distribution for simulation A. 
f{N) is lines with line central optical depth t > 10~^ per cm~^ 
per unit X, the absorption distance from Eqn. 1101 The lines are 
counted in bins of width 0.2 in log Nhi (e.g. 11.4 - 11.6) , and 
we report the value at the listed bin centers (e.g. 11.5). When 
we estimate f{N) in bins of width log Nhi= 0.5 instead, we find 
differences of approximately 0.02 at log Nhi— 13 cm~^, and 0.05 
at 19 < log Nhi< 20 cm-^. 
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-14.6523 


19.7 


-22.4385 


15.1 


-15.0434 


19.9 


-22.7923 


15.3 


-15.4000 


20.1 


-23.1537 


15.5 


-15.7948 


20.3 


-23.5523 


15.7 


-16.0774 


20.5 


-24.0321 


15.9 


-16.3919 


20.7 


-24.3957 


16.1 


-16.7003 


20.9 


-24.8255 



significantly different value dX/dz = 3^''^ = 1.73. A single 
sight line through the A box then covers 5X = {dX/dz)Sz = 
0.163571 where Sz = L„(l + z)/c = 0.0514693 and L„ is the 
velocity span of one sight line, 5143.37 km s~^. 

Larger boxes have larger maximum column densities. 
The smallest box A7 has no lines with log Nhi> 15 cm" 
while the larger boxes have a higher density of lines with 
log Nhi< 17 cm"'^. Indeed the trends are rather complex 
with e.g., box A showing a noticeably higher density of sys- 
tems with 14.5 < log Nhi < 16 cm~^ than all the other 
boxes. We see strong correlations between the f{N) values 
for similar N values because adjacent sight lines sample al- 
most the same absorbing gas and hence almost the same 
column densities. We also see large deviations when Nhi 
changes by about a factor of ten. Together these features 
make it hard to assess the errors and rate of convergence. 

In Figure [T^ we compare observed values for the column 
density distribution to the values from box A. We list values 
for box A in Table [T] We have attempted to correct the data 
to 2; = 2 and our X definition. 

Numerical simulations have often found too few sys- 

19961 : [ciiedinl 1 19981 : 
Simulations 



tems with high Nhi v alues dKatz et al.l 



20061 ) 



[Gardner etlol I2OOII : iD'Odorico et al 
need both sufficient volume to con tain the long wavele ngths 
modes to get enough CDM halos (|Bagla &: Ravllioosi ). and 
they need high enough resolution to make the clumps of 
gas that cause Lyman li mit systems (LLS). In recent work, 
iKohler fc GnedinI (|2007l ) are able to reproduce both the 




16 18 

log N„| (cm^^) 

Figure 14. The column density distribution f(N) relative to the 
A box. 



mean flux in the forest and the column density distribution 
of LLS at 2: = 4 in 4h~^ Mpc boxes with 2h~^ kpc resolu- 
tion. They also obtained approximately the real number of 
LLS per unit z. 

Our simulations lack absorption systems with large col- 
umn densities. We lack a factor of 1.2 for 14 < log Nhi 
< 15 cm"^. We see an increasing lack at log Nhi> 17 cm"'^, 
reaching a factor of approximately 30 by log Nhi= 19 cm~^, 
and a factor of 70 for Damped Lya lines (DLAs) with 
log Nhi— 21 c m~^. The point shown as a plus from 
iPetitiean e^sH (|l993l ) is unreliable at log Nhi— 19 cm 
We are also uncertain about the errors in the f{N) measure- 
ments, especially since we have not checked that the f{N) 
values are consistent with the total mean absorption that 
we assume at 2; = 2. Our simulation A has approximately 
the correct total absorption, and hence the excess lines with 
log Nhi< 14 cm~^ should make approximately the same to- 
tal absorption as the lack with log Nhi> 14 cm"'^. 

The lack at the higher columns, log Nhi> 17 cm"^ 
is due to insufficient numerical resolution in collapsed ha- 
los which give rise to this absorption and the lack of self- 
shielding against the UV background radiation. The lack 
is large enough that we must clearly remove high column 
lines from both simulations and real spectra when we want 
to make quantitat ive c omparisons, as fir st pointed out by 
iTvtler et al] (|2004l ) and I Jena etai] (|2005h . 



5 ID POWER SPECTRA OF THE FLUX 



Following ICroft fc Gaztanagal (|l997l ) and others, we define 
the flux contrast as 



f{u) = (F/F) 



(11) 



where u is velocity and F is the mean of the flux from 
all spectra through a box, listed in Table [5] While our 
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Figure 15. The column density distribution at z = 2 from 
observational data relative to the box A. We use linear inter- 
polation on the f{N) values for box A from Table [7] to find 
values at the log Nhi given in dat a tables. The observa tions 
| |1997D (•x). |Petitiean et alj diggal ) (+) 
tri angles: UVES). 



poin ts are from Kim ct al. 
and lO'Mearac t al. (2003) (squares: MIKE, 

We m ake no changes to the f{N) values from lO'Meara et al.l 
1I2OO7I . Table 5 & Fig. 7) because they use 0.7 while we 

use 0.73, which is a small difference, and they find no evidence 
for rapid evolution. If instead, the number of these lines per 
unit z were to evolv e as rapidly as N{z) ex (1 + z)^'^ then 
f(N, X) Qc{l + zf-^ jO'Meara et al.l | |20071) find that the power 
drops by approximately 0.5 for z > 2 when we change from 2 to 
X) and the points drop by 0.23 for MIKE and 0.33 for UVES. 
We make two types of corrections to all the other points. We de- 
fine /(Af, X) usi ng X(z) and dX/d z as in Eqn. [ lOlfor our assu med 
cosmology. Both lKim et al.l lll997l ') and lPetitiean et al] lll993l') use 
Xj,{z) = 0.b[{l + zf - 1] for r2A= and qo = giving dXp/dz = 
(1 -I- z). We calculate f(N,X) = f(N, Xp}(dXp/dz)/(dX/dz), 
at the mean redshift of each sample, where f{N, Xp) arc the 
published values. For iKim et all (wdlh we take the f{N, Xp) 
values from Table 2, with a mean z = 2.31 from two QSOs, 
and we multiply these v alues by a factor d{Xp)/dX = 0.980. 
For lPetitiean et al.l lll993l) we take the f{N, Xp) values from Ta- 
ble 2. We assume a mean z = 2.8, the mean Zem from Table 
1, weighted by the number of lines and evaluated for hya at 
a rest wavelength of 1120 A. We multiply these f{N, Xp) val- 
ues by a factor d{Xp)/dX = 1.037. For log Nhi< 17.2 cm"^ we 
make corrections for redshift e volution assuming f(N) pe r unit 
2 evolves as N{z) (X (1 + z)^-* llKirkman et al.ll2005l. l2007t) . and 
henc e f(N,X) o c (1 + z )^-^. We then multiply the f(N,X) values 
from Kim et al. I l ll997t) by 0.798 and those from I Petitiea^ et al. l 
1I1993I ') bv 0.581. For 17.2 < log Nhi < 19.1 cm-^ we assume 
that the Njz) oc (1 -|- z)^-^ jSareent et al.lll989l: [Lanzettalll99ll : 
IStenel er-Larrea et al .|1995^ and a mean redshift of 3.0. These evo- 
lution correction facto rs could have large errors. The point from 
iPetitiean et al. I 1I1993I) at log Nhi 19.09 cm ^ is for a large range 
for 17.68 < log Nhi< 20.5 cm~-^. We plot the point at the mean 
of the , but lower log Nhi lines tend to be more common, so the 
point should be plotted at some (unknown) lower Nnivalue. 



siRnal looks lik e that used by ICroft et al.l (|l998l ) and 
iMcDonald et al.l (|2000l ). there is an important difference. 
They both take F to be the mean flux in each spectrum. 
We discuss this alternative choice below in ^ 

Since the mode of the flux in each pixel is larger than 
the mean flux, f{u) is typically larger than zero. This defi- 
nition resembles that of 5cDM-i, except that Scdm involves 
division by the mean density of CDM that is a parameter 
input into the simulations and identical for all. In contrast, 
the mean flux is not known until spectra are made, and it 
varies from simulation to simulation and with z. 

We compute the Fourier transform of the flux contrast 
using f{u) in place of ScDM-i in Eqn. (5] We measure the 
(one-dimensional) flux power of each sight line in the z di- 
rection, and we present the average of the power from all 
sight lines. We have explicitly checked that we obtain the 
same power as does McDonald from the same spectrum. 

In Figure [16] we show the power spectrum of the flux, 
Pf, in this case from all spectra parallel to the z axis in 
each simulation. We tabulate values in Tables [8] and (9] and 
in Figure [T7] we show the power divided by that in sim- 
ulation A. In contrast to the CDM power, the differences 
between simulations are small, and of the opposite sign. In 
general, the larger boxes have smaller flux power, the op- 
posite of the trend that we saw in Fig. [5] for the power of 
the CDM. On the largest scale sampled by a box, the power 
at log k < —2 s/km in the A2 and A3 boxes is slightly 
less than that in the largest A box. This might be sim- 
ply the effect of the larger modes in the larger boxes. On 
intermediate scales —1.5 < log k < —0.6 there is a sys- 
tematic decrease in power with increasing box size, with 
larger changes on smaller scales (larger k). However, on the 
smallest scales, log k > —0.5 s/km, corresponding to sine 
wavelengths A < 20 km s^^ (4 cells in the simulation), the 
trends change direction. The differences between the simu- 
lations become less, and A2 returns to approximately the 
same power as A. The largest deviations in the ratios of the 
power of the flux are seen at log k > —0.7 (s/km), corre- 
sponding to changes in the shapes of the narrow lines. 

The large changes in the power of the CDM with box 
size, contrasted with the small changes in the power of the 
flux implies that the bias will change rapidly with box size, 
approximately as does the CDM. Attention must be given 
to the appropriate smoothing of the fields t o reduce the sen- 
sitivi ty of the CDM power to the box size (jMcDonald et all 

iMcDonaldl (|2003l ) shows how the power of the flux 
changes when he increases his box size from 28.2 to 56.3 to 
112.7 Mpc, while keeping the cell size constant at 220 kpc. 
He also sees that the power is lower in the smaller boxes 

We see that the change in power with box-size is con- 
sistent with the sirnulta, neous change in the &-value distri- 
bution. IViel et al.l (|2003l ) showed quantitatively how power 
of the flux on small scales resp onds to changes i n line b- 
values. For -2 < log fc < -0.2 IViel et al.l lj2003h saw es- 
sentially the same power from randomly placed Lya lines as 
from real spectra. They also showed that making all 6- values 
larger by a factor of two decreased the power at —1.5 < log 
k < —0.7. Our larger boxes have larger fe-values and they 
show decreased power on these scales. If all other factors 
are unchanged, we would expect higher temperatures for 
the IGM in larger boxes, which we confirm in §6.2. 
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Table 8. The log of the power of the flux in the A series simulations. The subscripts are values of log (k) (s/km), and the R are the 
ratios of the power (not the log) to the values in A at the same k. 



Box log P_ 

A 0.933 

A2 0.910 

A3 

A4 

A6 

A7 

A2KP 0.919 

A3KP 

A4KP 



log P_ 



log P_ 



log P-l if,_ 



R- 



R- 



0.757 


0.239 


-1.310 


1.00 


1.00 


1.00 


1.00 


0.739 


0.246 


-1.279 


0.95 


0.96 


1.02 


1.08 


0.767 


0.253 


-1.233 




1.02 


1.03 


1.20 


0.742 


0.325 


-1.144 




0.97 


1.22 


1.47 




0.336 


-1.060 






1.25 


1.78 






-0.945 








2.32 


0.747 


0.244 


-1.325 


0.97 


0.98 


1.01 


0.97 


0.784 


0.255 


-1.314 




1.07 


1.04 


0.99 


0.778 


0.342 


-1.238 




1.05 


1.27 


1.18 



Table 9. The ID power spectrum of the flux in simulation A. 
We include all wavenumbers below k ^ 0.01 (s/km). We then 
pick eight points from each log k decade and we force the last 
point to be the Nyquist frequency. We computed the error on the 
power from 4x4 tiles across the face of the box. The error is the 
standard error on the mean power from all the tiles. The Pe and 
its error refer to the evolving spectra that we discuss in |JB] We 
show powers of ten in parenthesis: 3.522(— 8) = 3.522 X 10~*. 



log k 


P(k) 


Pe{fc) (km/s) 


error 




crrore 


-2 


9134 


10.016 


10.021 





441 




0.441 


-2 


6124 


9.371 


9.377 





409 




0.411 


-2 


4363 


8.134 


8.135 





381 




0.382 


-2 


3114 


7.576 


7.576 





363 




0.364 


-2 


2144 


6.865 


6.872 





311 




0.311 


-2 


1353 


6.446 


6.447 





283 




0.284 


-2 


0683 


6.053 


6.057 





270 




0.270 


-2 


0103 


5.790 


5.794 





250 




0.251 


-1 


9592 


5.407 


5.412 





236 




0.237 


-1 


6830 


3.264 


3.260 





124 




0.123 


-1 


5155 


1.855 


1.849 





651 


-1) 


0.648(-l) 


-1 


3949 


1.068 


1.068 





315 


-1) 


0.314(-1) 


-1 


3006 


6.243(-l) 


6.256{-l) 





170 


-1) 


0.171(-1) 


-1 


2232 


3.748(-l) 


3.749{-l) 





106 


-1) 


0.107(-1) 


-1 


1575 


2.269(-l) 


2.273{-l) 





600 


-2) 


0.620(-2) 


-1 


1005 


1.376(-1) 


1.376{-1) 





375 


-2) 


0.380(-2) 


-1 


0501 


8.435(-2) 


8.398{-2) 





226 


-2) 


0.229(-2) 


-1 


0049 


5.185(-2) 


5.171{-2) 





149 


-2) 


0.149(-2) 


-0 


8565 


7.716{-3) 


7.725{-3) 





266 


-3) 


0.272(-3) 


-0 


6630 


2.878(-4) 


2.904(-4) 





137 


-4) 


0.140(-4) 


-0 


5296 


2.210(-5) 


2.215{-5) 





121 


-5) 


0.127(-5) 


-0 


4277 


3.272{-6) 


3.633{-6) 





196 


-6) 


0.269(-6) 


-0 


3452 


7.803{-7) 


9.018{-7) 





451 


-7) 


0.954(-7) 


-0 


2759 


2.357(-7) 


3.350{-7) 


1 


341 


-8) 


5.048(-8) 


-0 


2162 


1.296(-7) 


2.193{-7) 


7 


166 


-9) 


3.727(-8) 


-0 


2041 


1.286(-7) 


2.073(-7) 


7 


110 


-9) 


3.522(-8) 



5.1 Autocorrelation of the Flux 

Although containing exactly the same information as the 
power, the autocorrelation can better illustrate characteris- 
tics of the signal that are hard to see in the power, such as 
correlations over large scales. While a power spectrum is a 
complete statistical description of a random Gaussian field, 
the flux distribution is far from Gaussian, hence neither the 
power spectrum nor the autocorrelation will be a complete 
description. 

The autocorrelation of the flux for a given velocity shift 
Su can be computed directly from a transmission spectrum 
as Cf(5m) =< (-F(m) - F){F{u + Su) - F) >, where F is 




og k (s/km) 

Figure 16. The power spectrum of the ID flux contrast (Eqn.[TT] 
flux divided by mean flux in that simulation box) from A-scrics 
boxes for all Af ^ sight lines along the z-direction. The fc is in s /km 
and the smallest boxes gives the smallest amount of power at log 
k = -1. 



the mean flux in a box from Table [5] The brackets refer to 
an average across the pixels of the spectrum. We choose to 
obtain the autocorrelation from 



El 
2-K 



P{k)e'*'^'^dk 



(12) 



where P(k) is the power of the flux contrast and we can use 
the line of sight averaged power directly because we have 
divided by the mean flux of each cube. The results we report 
here, as with the flux power, are the average autocorrelation 
profiles at each velocity shift from all lines of sight. 

For the power spectrum, the longest non-zero mode is 
one wave in the box. By analogy, for a sight line parallel 
to the box edges, the largest lag in a box is u = L„/2. We 
expect the autocorrelation to decrease up to scales of L„/2 
and then to rise again, since a lag of u in one direction is 
simultaneously a lag of L„ — u in the reverse direction. This 
is in contrast with real spectra where the autocorrelation 
will continue to decrease with increasing lag. The number 
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Figure 17. The power spectrum as in Fig. 1161 but now divided 
by tlie power from simulation A, which is the line at vertical value 
1. 



Figure 18. How the autocorrelation of the flux depends on box 
box size and the definition of the signal. Autocorrelation vertically 
again velocity lag horizontally. The solid curves were calculated 
using the mean flux of the box. The dotted curves use instead 
the mean flux of each sight line, F^. We show the difference of 
the two, from Eqn. 1131 in the bottom panel. The signals from the 
larger boxes extend farthest to the right. 



of samples of each lag drops from A'^ for lags of one pixel to 
N/2 for lags of L/2. When we shift a spectrum by some lag, 
we loop around the periodic boundary conditions, making 
the first and last pixels adjacent, so that all shifted spectra 
have the same length. 

In Figure[TS]we show the autocorrelation of the flux cal- 
culated using the mean of the flux from all spectra through 
each box. The autocorrelation of the flux falls with increas- 
ing lags, and it falls to lower values in larger boxes. The 
larger boxes also have smaller correlation for most velocity 
lags. However, the autocorrelation does not drop to zero on 
the largest scales in a box, rather it stops falling at some 
low plateau value. 

If we define the autocorrelation using the mean flux in 
each sight line Fl , instead of the mean of the box, then the 
autocorrelations are all reduced. We show these autocorre- 
lation functions as dashed lines in Fig. 1181 The amount by 
which the two autocorrelation calculations differ is equal to 
the variance of the line of sight mean flux values about their 
mean, which is the mean flux of each cube that we give in 
Table [5] 

MF{5u) = ^Y.(F~FLf = VariFL), (13) 

L=l 

where A'''^ is the number of sight lines we use to sample 
each TV'^ box. The smaller boxes have larger sight-line to 
sight-line variance, as can be seen in Fig. [T5] and hence their 
autocorrelation (and other statistics) is decreased the most 
then we use the mean flux per sight line. 
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Figure 19. The distribution of the mean flux per sight line, 
Fi^, for all simulations in the A series. The vertical scale shows 
the number of sight lines, divided by the number of sight lines in 
the A box. This division lowers the distributions from the smaller 
boxes (lower histograms at Fj^ = 0.8) making it easier to see each 
distribution. 
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Figure 20. The change in the velocity field with box size. The 
fraction of cells, on a log scale, as a function of the modulus of the 
gas velocity in the z direction. We sample the fractions in steps 
of 8 km s~l for A 4 km s~^ for A2, 2 km for A3, 1 km s~l 
for A4, 0.5 km s~^ A6 and 0.25 km s"'^ for A7, and we report the 
fraction of cells per 1 km s~^, for all simulations, which accounts 
for why the total area under the curves falls with increasing box 
size. 

6 VELOCITY FIELD, BARYON 

TEMPERATURE AND BARYON DENSITY 

Having seen how the statistical properties of the Lya for- 
est depend on box size we now return to the examine the 
changes in the velocity field, and the baryon temperature 
and baryon density in the simulations, since these fields con- 
trol how the Lya forest changes. 



Table 10. Statistics of the proper velocity (km/s) for the baryons 
in the colls in the A series simulations. 



Box 


mean 


median 


max 


A 


136.4 


138.2 


1153.9 


A2 


110.8 


103.0 


764.3 


A3 


74.9 


70.9 


515.1 


A4 


47.2 


45.0 


277.2 


A6 


27.8 


26.7 


153.3 


A7 


14.1 


13.8 


59.2 
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Figure 21. Power law fits to the most common temperature at a 
given baryon overdensity, from J05 Table 9. The fits shown, from 
top to bottom on the right, are for the A2, A3 and A4 simulations. 



6.1 How gas velocity changes with box size 

In Fig.[2Slwe see that the velocity of the cells increases dra- 
matically with increasing box size. Velocities > 160 km s~^ 
are common in the three largest boxes but non-existent in 
the two smallest boxes. In Table [TO] we list the minimum, 
median and maximum baryon velocity in each box. These 
velocities increase by factors of 1.23 - 2.59 as we double 
the box size, with the largest factors applying to the maxi- 
mum velocity for the smallest pair of boxes. The maximum 
changes the most because this is sensitive to the rare high 
density regions. However, all cells show a systematic increase 
in velocity, as illustrated by the factor of 1.34 increase in the 
median velocity going from the A2 to A box. 

6.2 How gas temperature and density change 
with box size 

In lTvtler et al] (|2004 Fig. 19) we showed the temperature 
of cells in simulation B as a function of their baryon density. 
Simulation B has the same parameters as the A series used 
here, but with cells that are half the size. We fit a broken 



power law to the ridge line that specifies the most common 
T at a given density, but noted that these fits were not very 
satisfactory in shape. In Table 9 of JOS we fit single power 
laws T(p) — To{pb/pb)" to 0.2 < pb/pb < 3 in many simu- 
lations, where pb is the cosmological density of baryons. We 
found values of To = 11,982, 12,561 and 12,910 K for A4, A3 
and A2, showing an increase in temperature at a given den- 
sity with box size. We also saw a systematic increase in the 
index a with box size which corresponds to a larger differ- 
ence in temperature at higher densities, and near identical 
temperatures at pb/pb = 0.35. In Figure [211 we show these 
fits for simulations A4, A3 and A2. 

Fig. [22] is a contour plot of the temperature T of cells 
as a function of baryon overdensity for simulations A and 
A4. The vertical axis is T / {pb/ pbY''^ to remove much of the 
tendency for T to increase of with density. We are most 
interested in the densitie s that produces the Lya forest. We 
found in ^3] that ISchavd (|200ll . Eqn. 10) suggests that the 
Lya forest lines with log Nhi= 12.5 - 14.5 cm~'^ that we use 
to study the 6- value distribution typically come from baryon 
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Figure 22. Contour plot of the temperature of cells against their 
baryon overdensity. Blue and black contours correspond to A4 and 
A boxes respectively. We generate the contours from the scatter 
plot between the grid values of T/ {pi,/ pi,)^'^ and pb/Pb by bin- 
ning the X-axis and y-axis in logarithmic intervals of 0.001 and 
0.02 respectively and computing the number of points in each 2D 
mesh. The iso-level values show the fraction of points (in loga- 
rithmic units) contained within each contour relative to the total 
number of points in the simulation contained within the bound- 
aries of each axis. The nearly vertical line on the left shows the 
typical cells responsible for Lyo lines with log Nhi= 12.5 cm~^ 
while the line on the right is for 14.5 cm~'^. 



overdensities 0.5 - 15.7. On Fig.| 
log Nhi, 



[Iwe show lines of constant 



^q4^, / 8.82 X 10' 



N 



HI cm 



(14) 



obtained from ISchavd (|200ll . Eqn. 10). We see, with close 
inspection, that, at overdensities above ~ 0.5 the contours 
of the larger box have on average shifted to higher temper- 
atures, particularly in the regions closer to the frequency 
peak of this 2D distribution. The gas that makes the Lya 
forest absorption is clearly hotter in the larger box. 

In Fig. [23] we show the pdf of the temperature per cell 
for three different baryon overdensity ranges, 0.5 - 1.5, 1.5 
- 5 and 5-15. For each density range, we show six distri- 
butions, one for each box size. The temperature pdf shows 
very little change with box size for low overdensities 0.5 - 
1.5, but the intermediate and especially the higher densities 
the larger boxes have systematically higher temperatures. 
This tendency of increasing temperature with box size at 
higher but not lower overdensities confirms the power law 
fits to the most common temperatures from J05 that we 
showed in Fig. 1211 

In Fig. [24] we show the mean temperature of cells as 
a function of baryon overdensity for the A series simula- 
tions. We see a dramatic increase in the mean temperature 
in the larger boxes especially at higher overdensities. The 
percentage increase in the mean temperature with box size 



Figure 23. The pdf of the temperature (in K) of cells for three 
contiguous ranges of baryon overdensity, the ranges that are re- 
sponsible for Lya lines with log Nhi> 12.5 cm~^ (the histograms 
on the left for the lower overdensities) to < 14.5 cm~^ (histograms 
on the right for the higher overdensities). We show the fraction 
of all cells in the density range, and we sample the temperature 
in steps of logT = 0.01. 



decreases at lower overdensities, as we just saw for the much 
more restricted range of densities in 1231 

Fig.[5S]is like Fig. 1241 but now showing the temperature 
which is exceeded by 50% of cells, the median. We again see 
that the larger boxes are hotter but the differences are much 
smaller, especially at the low densities of the Lya forest. The 
much larger increase in the mean temperature comes from 
relatively few cells that have undergone shock heating to 
temperatures much larger than the median and well above 
the temperature at which there is sufficient H I to make Lya 
fines. 

In Fig. [26] we see that relatively few cells are attaining 
much higher temperatures in the larger boxes. The temper- 
atures above 10^ K come from shocks which are rarer and 
weaker in the smaller boxes because the velocities and peak 
densities are lower. 

Fig. [27]summaries the changes in temperature with box 
size, and shows that the trends of relevance to the Lya forest 
are only revealed by specific statistical measures. 

In Fig. [28] we show a slice of the A box one cell thick. 
We added two contours one of which shows the minimum 
typical over density for the IGM and the other the upper 
overdensity. The arrows show the velocity of the cells. While 
most cells making Lya absorption are surrounded by cooler 
gas, those that are flowing into the highest density regions 
are next to hotter gas. 

In Figs. [29]and[30]we show the pdf of the baryon density 
per cell for different box size. We see that most of the pdf 
moves to lower density in the larger boxes. In the density 
range responsible for typical Lya forest lines with log Nhi 
12.5 - 14.5 cm~^ there are systematically fewer cells in the 
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Figure 24. The mean temperature (K) of all cells as a function 
of log baryon overdensity, evaluated in log overdensity intervals 
of 0.1. At —0.5 < log Pb/pb < 2.5 the order of the curves is that 
of increasing box size towards the top. 
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Figure 25. The median temperature (50% are hotter) of all cells 
as a function of log baryon overdensity, sampled in bins of log 
overdensity = 0.1. The order of the curves is that of increasing 
box size to the top at log pt/pb = 2.2. 

larger boxes, which can explain why we saw in Table [S] less 
absorption in the larger boxes. If the density distribution 
drops by a constant factor for all densities relevant to the 
Lya forest, then the f{N) will remain unchanged in shape. 
The log vertical scale in Fig. [30] shows that this is approx- 
imately true, but in detail there is a slightly larger rela- 
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Figure 26. The fraction of cells as a function of baryon tempera- 
ture (K), sampled in bins of logT = 0.1. The larger boxes extend 
farther to the right. 
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Figure 27. The mean (circles) and median (squares) baryon tem- 
perature (K) as a function of box size. We evaluate these statistics 
in bins of 0.1 in log T, for both the entire box, and for the densities 
that produce the Lyo forest. 

five decrease in the number of low density cells. The larger 
boxes then have relatively more lines with higher log Nhi 
values, as we have already seen in Fig. [14] for columns 13 < 
log Nhi< 15 cm~^. Since lines with higher Nhi values tend 
to be wider, since they come from higher densities where 
the gas is hotter, consistent with the larger ba values in the 
larger boxes. 
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Figure 28. A 30x30 Mpc section of the A box, one cell thick and 
containing the highest density cell in the whole box. The color 
scale shows the baryon temperature, with the red to blue bound- 
ary at the typical temperature for a Lyes lines with log Nhi= 
12.5 cm"'^. The red regions are cooler and the yellow and whitish 
regions are hotter than the blue regions. We show two contours, 
one for the baryon overdensity of 0.5 and the other for 15, the 
range responsible for many Lya forest lines. The arrows show the 
baryon velocity with a length linearly proportional to the ampli- 
tude of the velocity. The typical arrow in the upper right is 280 
km s~^. 
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Figure 29. The effect of box size on the pdf of the log baryon 
overdensity in each cell. At logpi,/p(, = the curves for larger 
boxes are lower on the plot. The Lyo forest typically comes from 
—0.3 < log pi,/ Pi, < 1.2, all to the right of the peak, where the 
larger boxes have systematically lower fractions of their cells. 
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Figure 30. As Fig. 1291 but with the log of the fraction of cells 
on the vertical axis to show the relative changes. The curves for 
larger boxes extend farther to the right and are lower on the plot 
at Pb/pb = 0. 

7 SIMULATIONS WITH NEARLY CONSTANT 
MEAN FLUX AND B- VALUES 

We ran a second series of simulations using input parameters 
that we adjusted to make the mean flux and values ap- 
proximately constant, at the values for the simulation A. We 
adjusted the intensity of the UVB 7912 and the amount of 
heating per He II ionization, X228. We determined these pa- 
rameters iteratively using scaling relations similar to those 
described in JOS. At redshift z = 2 the ionizing background 
were multiplied by the factors 7912, listed in Table [T] For 
these KP simulations, we also augmented the UVB by ad- 
ditional factors to make the mean flux at those redshifts 
closer to the values reported in Keck HIRES spectra by 
iKirkman et~all (|2005h . At 2 ^ 2 we multiplied the fluxes 
by 1. At z = 2 - 3 we multiply by 1.3(z — 2), and at z > 3 
we multiplied by 1.3. 

In Table[5]we see that the mean fluxes are indeed similar 
to that of A, although the modes are less so. In Table [S] we 
see that the be values are all similar and between those for 
A and A2. We could have iterated further to improve the 
agreement, but felt that this was not necessary for this work. 

In Fig. |3l]we show the b- value distribution for the KP 
series. The three are more similar to each other and to A 
than were the similar sized simulations from the A series, as 
we expect. 

In Fig. [32] we show for the power of the flux contrast 
for the KP simulations, and in Fig. we show the same 
divided by the power from A. Comparing to Fig. [17] we see 
that the power in the KP series is factors of 2 - 3 closer to 
the power in A than were the simulations of the same size 
in the A series, though the differences are less reduced on 
the smallest scales. 
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Figure 31. The fe-value distribution for boxes in the KP-series, 
which should be compared to Fig. ll2l for the A series simulations. 
Just to the right of the peak, the curves are from the top A4kp 
(green), A3kp (blue) and A2kp (violet). 




Figure 32. The power spectrum of the ID flux contrast (Eqn. 
mil from KP series, A2kp (violet), A3kp (blue) and A4kp (green) 
together with A (black) . The curves from the larger boxes extend 
farther to the left. Compare to Fig. 1161 for the entire A series. 
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Figure 33. The power spectrum of the ID flux contrast (Eqn. 
mi l from KP series divided by the power from A. From bottom 
to top at log k = —0.5 s/km we show A2kp (violet), A3kp (blue) 
and A4kp (green). Compare to Fig. 1171 which is on nearly the 
same vertical scale. 



Figure 34. Distribution of the flux per pixel for KP and A series. 
We show the fraction of pixels divided by those for simulation A. 
We show simulations A2, A3 and A4 (solid lines) from top to 
bottom at Flux = 0.7, and below, A2kp, ASkp and A4kp (dashed 
lines). Compare to Fig. Illl for the A series. 



In Fig. [34] we see that the distribution of the flux in the 
KP series is significantly closer to A than are the A series 
simulation of the same size. The difference is about a factor 
of two for A3 and A4, such that A4KP is similar to A3, and 
A3KP is similar to A2. The improvement seems larger for 
the larger boxes. For A2 the frequency of Flux — 0.8 is 1.05 
times the frequency in A, while for A2KP this is 1.02. There 
are even larger improvements at fluxes above 0.97. 

In general, we see that the adjustments in the 7912 and 
X22S that we made for the KP series allow a given KP sim- 
ulations to have approximately Lya forest statistics of an 
A series simulation that is twice the size. For some applica- 
tions, we can then save a factor of 8 in computing resources. 
We can use larger X22S values, corresponding to more heat 
input, to partially compensate for the effects of limited box 
size. We simultaneously need smaller 7912 values to maintain 
the same mean flux. 



8 SIGNAL DEFINITION AND 
NORMALIZATION 

The division by the mean flux can introduce significant am- 
biguity, because there are many ways to select the mean, 
and the mean is a function of z. There are two main ways 
of defining the mean fiux; global and local. 

In this paper we use global definitions for the mean flux 
that come close to approximating the true mean flux at each 
z. We have divided spectra by the mean flux from the whole 
of each simulation box. We could alternatively have taken 
an est imate of the mean flux from a calibrated m easure- 
ment (|Tvtler et al. I I2OO4I : iKirkinan et al. I l2005l . l2007h . When 
we use real spectra we must remove the metal lines and the 



strong Lya lines of LLS and DLAs because they add signif- 
icant absorption to the Lya forest that is not from the low 
density IGM and t hat will be missin g from simula ti ons. 

Hui et all (120011 ): iKim et al.1 (|2004bh : 

measures 



In contrast , . , _ _ . . . , . 

[McDonald et all (|2006h and others have used local 



of the mean flux. They divide each spectrum by its own mean 
flux, since their goal is to avoid continu um fitting o r to re - 
duce the errors in the continuum level. I Kim et al] (|2004bf ) 
[Fig. 2] obtained similar power spectra at fc > 0.002 s/km 
when they divided real spectra by either fitted continua or 
the mean flux. 

We do not advocate division by the local mean flux 
for several reasons. We need to know the lengths of each 
spectrum to make a precise comparison with other data or 
simulations. For extremely long artificial spectra, division 
by the mean flux in individual spectra is not very different 
to dividing by the overall mean flux of the whole sample 
of many spectra, but for the short spectra, including those 
from our boxes, the differences are huge. 

In real spectra the mean flux varies significantly from 
spectrum to spect rum due to large scale structure. In 
iTvtler et all (|2004l ) (Fig. 13, 16, Table 4) and we found 
that at 2: = 1.9 the standard deviation of the mean ab- 
sorption in 121 A segments from the low density IGM alone 
is about 1/3 of the mean amount of absorption. In addition, 
the metal lines and strong Lya lines and the low density 
IGM all contribute similar amounts to the variation in the 
total amount of absorption. Hence, when we divide by the 
mean flux in each spectrum, we are removing much of the 
large scale structure signal, and introducing undesired cor- 
relations with the metal lines and strong Lya lines, with no 
guarantee that we are removing any errors in the continuum 
level. Indeed, the continuum level errors of most interest are 
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Figure 35. The ID power spectra of the flux where the signal is 
divided by the mean flux in each sight line, and not the mean of 
the box. Compare to Fig. 1161 that is identical except that there 
we divided the flux in each spectrum by the mean flux in the 
simulation box. 

on the short scales of the flux caUbration errors and the 
emission hne shapes, and not necessarily correlated with the 
mean flux across the whole of a spectrum. 

In Figure we show the power spectra of the flux ob- 
tained when we divide the flux in each spectrum by the mean 
flux of that spectrum. Pp. We show in Fig. [36] the power of 
the flux, divided by the mean flux in a sight line, and then 
divided by the same quantity for box A. The power in the 
larger boxes is little changed from in Fig. [16] where we di- 
vided by the mean flux in the whole box, but the power in 
the smaller boxes is raised. 



9 HOW RESOLUTION CHANGES THE 
SIMULATED IGM 

In Figs. [37] and we show how the pdf of the baryon over- 
density per cell varies with the cell size. For the common den- 
sities shown in Fig. [37] we see that simulations with smaller 
cells have systematically lower densities. These changes are 
larger for cell sizes 150 to 75 to 37.5 kpc, but the changes 
are too small to see when the cell size drops from 37.5 to 
18.75 kpc. The explanation for this trend to lower densities 
is given in Fig.[38]where we see that simulations with smaller 
cells contain a few cells with much larger densities. These 
cells contain the baryons that is depleted from the bulk of 
the volume. 

In Fig. [39] we see the mean temperature of cells at a 
given baryon overdensity increases with decreasing cell size. 
The increase is minimal when we decrease the cells from 37.5 
to 18.5 kpc, suggesting that 37.5 kpc is small enough for the 
current work. 

In Fig. [40]we show the median instead of the mean tem- 
perature. The changes are now much smaller, except near log 
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Figure 36. The ID power spectra of the flux where we divided 
the flux in each spectrum by the mean flux in that spectrum. 
As Fig. 1351 but now we divide the power by that in the A box. 
Compare to Fig. I17l where we divided each spectrum by the mean 
flux in the whole box. 
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Figure 37. The pdf of the log baryon overdensity per cell for the 
B series simulations which differ in only their cell size. We show 
the fraction of cells sampled in intervals of logpj/p;, = 0.1. 



overdensity = 1.7 where we see the peak temperatures and 
no sign of convergence as we decrease the cell size. 

In Fig. [41] we show how the power of the flux depends 
on the cell size. With smaller cells there is less power on 
the largest scales and more on small scales. The boxes with 
smaller cells begin with more power in total because their 
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Figure 38. As Fig. 1371 but witii a log scale vertically. We sample 
in bins of size 0.1 in the log baryon overdensity. Simulations with 
smaller cells contain larger densities and extend farther to the 
right. 
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Figure 39. The mean temperature of cells as a function of the 
log baryon overdensity, sampled in bins of log overdensity 0.1. 

power spectra extend to smaller scales. Their structure be- 
comes non-linear on small scales earlier and this encourages 
the growth of power on small scales at the expense of large 
ones. 

In Fig. [42] we show the ratio of the power of the flux 
to the power from the B2 simulation that has the smallest 
cells. In general, the boxes with smaller cells have smaller 
flux power on the largest scales (small fc) and more power 
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Figure 40. As Fig. 1391 but showing the temperature that is ex- 
ceeded in 50% of cells, the median, sampled in bins of log over- 
density = 0.1 
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Figure 41. The ID power spectra of the flux of the B series 
simulations which differ in only their cell size. We divided the 
flux in each spectrum by the mean flux in that simulation box. 
We terminate the power spectra at the Nyquist frequency for that 
cell size: B3 (150 kpc cells, log A; = —0.5, red line), A4 (75 kpc, 
log k = —0.2, dashed black line), B (37.7 kpc, log k = +0.1 s/km, 
blue hne), B2 (18.75 kpc, log k = 0.4 s/km, black hne). 
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on the smallest scales. We see that the maximum k value at 
which the power is larger than in B2 shifts systematically to 
higher values with the smaller cells: from —1.32 s/km (B3, 
150 kpc cells) to -1.0 s/km (A4, 75 kpc cells) to -0.5 s/km 
(B, 37.5 kpc cells). We see that factor by which the power is 
larger than in B2 on large scales (small k) is approximately 
constant over a range of k values, at approximately 1.35 for 
A4 and 1.07 for B. This suggests rapid convergence as the 
cell size decreases below 18.75 kpc. However, the conver- 
gence on smaller scales is much less rapid. The ratio of the 
power to that in B2 is a minimum on scales near a factor of 
two larger than the Nyquist frequency. These minimum val- 
ues for the power ratios are large and approach 1.0 slowly as 
we decrease the cell size; from 0.52 (B3) to 0.75 (A4) to 0.83 
(B). This behavior suggests that cells smaller than 10 kpc 
will be needed to get the power at log fc ~ s/km to within 
a factor of 0.9 of the value in a simulation with much smaller 
cells. 

One other feature of the power spectra of the flux is 
more troubling. We see that the power increases steeply 
on the smallest scales, just above the Nyquist frequency. 
We saw similar behavior in Fig. [16] for the A series. Early 
in this investigation we saw much larger versions of these 
upturns in power which were caused by errors in the gen- 
eration of spectra that lead to discontinuities in the flux. 
We continued searching for errors and found no more. How- 
ever, the behavior is clearly not physical, because simula- 
tions with smaller cells do show that the power ratios con- 
tinue to decline smoothly on decreasing scales. We should 
not use the power from these simulations on scales within 
log k = 0.2 s/km of th e Nyquist frequency. 

iMcDonaldl (|2003l . Fig. 6) shows how the power of the 
flux varied for three hydro-PM simulations (no shocks) in 
6.25 Mpc boxes with cell sizes of 24.4, 48.8 and 97.6 kpc. 
While we both see that the largest cell size gives results very 
different from intermediate sizes (50 - 75 kpc), we do not 
see much else in com mon betwe e n our results. This confirms 
the point made by iMcDonaldl (|2003l ). that the results of 
resolution studies depend on the nature of the small scale 
force calculations and physics. 

In Fig. |43]we see that the b- value distributions moves to 
significantly smaller velocities with smaller cells, except for 
B2 which has slightly larger velocities than B, reversing the 
trend. In Table [6] we list the 6o- values for the Hui-Rutledge 
fitting formula. The A column shows that the ba value drops 
4.2 km s~^ from 150 to 75 kpc cells, and then 2.0 km s~^ 
going to 37.5 kpc, but it increases by 0.2 km s^^ going to 
18.75 km s~^ cells. Since the internal error in the measure- 
ment is about 0.8 km s~^, 35 kpc cells seem to give a fair 
estimate of the feo- that would apply to a simulation with 
much smaller cells. The fitting function gives an excellent 
representation of the 6-value distributions. In detail we see 
systematic diff'erences between these distributions and the 
function, e.g. the function is too high around the most com- 
mon 6- values (especially for the larger cell sizes) and has too 
many lines with & > 40 (for B2, B) or > 50 km s~^ (for B3). 
As for the A-series, we use only lines with fe < 40 km s~^ 
when we estimate the values. 

In Fig. [44] we see that the simulations with smaller 
cells have factors of several more Lya lines with the low- 
est column densities log Nhi< 13 cm"^. However, the small 
cells also give about 20% fewer lines with 13 < log Nhi 
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Figure 42. As Fig. 1411 but showing the ratio of the power of 
the flux to the power from the B2 simulation. On the far left 
the simulations are from the top B3 (150 kpc ceUs, red line), A4 
(75 kpc, dashed black Une), B (37.7 kpc, blue Une), B2 (18.75 kpc, 
black line at 1.0). 




40 60 
b (km/s) 



Figure 43. The distribution of the 6-values for Ly« lines with 
12.5 < log Nhi < 14.5 (cm~'^) and line central optical depth 
T > 0.05 in B series simulations. The jagged thin lines are the 
distributions of the values and the smooth curves are the Hui- 
Rutledge fits to each pdf. From the right, at a fraction of 0.004, 
the simulations are B3 (150 kpc cells, red line), A4 (75 kpc, dashed 
black line), B2 (18.75 kpc, black) and B (37.7 kpc, blue line) which 
is out of order and largely hidden under B2. 
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Figure 44. How the column density distribution depends on the 
cell size of a simulation. We show the f{N), measured using all 
lines with line central optical depth t > 10~^. Here only we 
evaluate f{N) per unit z rather than X, and we divided by the 
f{N) from the B2 simulation. We give values for evaluated in bins 
of width 0.1. On the far left the simulations are from the bottom 
B3 (150 kpc cells, red line), A4 (75 kpc, dashed black line), B 
(37.7 kpc, blue line), B2 (18.75 kpc, black lino at 1.0). 



Table 11. The Convergence of Statistics of the A series. The 
ratios are of quantities from smaller boxes A4, A3 or A2 to the 
value in the largest box A. When a range is indicated, we list the 
largest value in that range. 



Quantity 


A4/A 


A3/A 


A2/A 


Flux mean 


1.0031 


1.0023 


1.0008 


Absorption = 1— flux mean 


0.9790 


0.9844 


0.9946 


Flux pdf (F=0.995-1.0) 


0.62 


0.81 


0.93 


Flux pdf (F=0.8) 


1.24 


1.13 


1.05 


Typical line width (km s~^) 


0.903 


0.929 


0.959 


f(N) log Nhi= 12.5 - 14.5 


0.83 


0.85 


1.10 


f{N) log Nhi= 15 


0.73 


0.81 


0.84 


Flux P(k=0.01) 


0.966 


1.023 


0.959 


Flux P(k=0.1) 


1.465 


1.194 


1.074 


Frequency of CDM density 


0.93 


1.03 


0.98 


Mode CDM density 


0.2 


0.2 


0.50 


Table 12. Comparison of Statistics from Simulation A to those 


from Data. The column headed "A 


' lists the value of the quantity 


in box A. The column A2-A lists the value of the parameter in 


box A2 minus the value in box A. 


The errors 


on the data values 


are guessed, not precise, values. 








Quantity A 


A2-A 


Data 


o-(data) 


Flux mean 0.8714 


-0.0007 


0.869 


0.01 


b„ (km s-i) 26.7 


1.1 


23.6 


1 


logf (log Nhi=14.3) -13.49 


-0.03 


-13.39 


0.2 


Flux P(k=0.01) 5.8 


0.23 


7 


1 


FluxP(k=0.1) 0.049 


-0.004 


0.13 


0.05 



< 17 cm~^ where the precise range depends on the cell 
size. Hence simulations with smaller cells are slightly far- 
ther from data than was simulation A (75 kpc cells) that 
has too few lines of high log Nhi (Fig. I15|) . 



10 CONVERGENCE AND COMPARISON 
WITH DATA 

In Table [TT] we summarize how various statistics change as 
we increase the size of the simulation box. In Table [TJ] we 
compare the difference between the values we measure in 
the A and A2 boxes to the likely errors from measurements 
of data. 

We see very small changes in the mean flux with in- 
creasing box size, in part because the amount of absorption 
is small compared to the mean flux. The changes are better 
seen in the amount of absorption itself. The rate of conver- 
gence suggest that the mean flux in A is within 0.0007 of the 
value expected in a much larger simulation. This is about 
a factor of 14 smaller than the measurement error of ap- 
proximately 0.01, from sample size, continuum level errors 
and difficulties removing absorption from me tal lines and 
strong Lyg lines down to some fixed Nhi va lue (|Tvtler et al.l 
l2004l : iKirkman et al.|[2007l : iKim et al.ll2007l ). The mean flux 
in simulation A is essentially identical to that from Eqn. 10 
of J05, scaled to z = 2, and we expect this to remain true 
in a much larger box. 

For the flux pdf. Figure [TT] indicates that simulation A 
will be within about 5% of the frequencies for a much larger 
simulation. 



For ba the error from the simulation box size is compa- 
rable to that for data. The A values in Table [S] do not show 
much evidence for convergence, since the change in from 
A2 to A is larger than the change from A3 to A2, and from 
A4 to A3. This slow convergence can be traced back to the 
effects of the long modes of the CDM density fluctuations 
that change the sizes of the absorbing regions, the velocities 
and the temperatures. 

We a lso see that the bg for A is significa ntly larger than 
for data (|Kim et al.ll200ll : Ijena et al.il2005l ) and the diflfer- 
ence will be still larger in a larger box. To better match data, 
we need a simulation with less heat input (smaller X228) 
or larger as (Figs. 21 and 38 of J05), which is a surprise 
since the value we are using, as= 0.9, is large compared 
to the WMAP 3-year sugges tion. We llTytler et al.l |2004|: 
Jena et al.ll2005|) and others (|Viel et al.ll2006l : ISeliak et al.1 



20051 : IViel e'raiTl2007D have previously noted that the Lyo? 
forest data prefer much l arger as values than does the CMB 
anisotropy. ISlosar et al.l (|2007l ) use Lya forest, Supernovae 
and galaxy clustering data with WMAP 3-year data to es- 
timate n = 0.965 ± 0.012 and (78= 0.85 ± 0.02, compared to 
(T8= 0.80 ± 0.03 without the Lya forest. 

The changes that we will need to make to the simula- 
tions match the column density distribution of data will also 
change the b-value distribution and the ba value. The min- 
imum b-value in the Lya forest increases as Nhi increases 
(IKirkman fc TvtledlT997[ ) until we reach log Nhi= 15 cm 
after which the valu es start declining in our HIRES spectra 
and in simulations (|Misawa et al.l [20041 . Figs. 3, 5). Since 
simulations need fewer lines with log Nhi< 14 cm~^ and, 
in compensation to conserve the total absorption, more at 
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14 < log Nhi < 15 cm^^ (Fig. \TEi, the mean 6 values will 
increase, exacerbating the differences with data. 

The lack of high column lines is the second conspicuous 
difference between our simulations and data. As noted previ- 
ously, this is due to insufficient spatial resolution and lack of 
self-shielding in collapsed dense halos. In Fig. [15] we saw that 
our simulations have too many lines with log Nhi< 14 cm~'^, 
a slight lack of lines with log Nhi= 14—15 cm" , and a large 
lack with log Nhi> 17 cm"^. This lack of high column lines 
will reduce the power to below that in real spectra that in- 
clude such lines. We also noted that our sight lines that are 
parallel to the box sides are too short to contain the full 
damping wings of DLAs. Fig. [T?] shows convergence as the 
box size increases and suggests that the f{N) values from 
simulation A for log Nhi= 12.5—14.5 cm"'^ are within about 
10% of the values we would obtain from a much larger box. 

In Figs. 133 and we compare the power of the flux 
of the Lya forest in data to that in our simulations. The 
power from the simulations is less than in the data at all 
k values. The power in the simulations is too low by about 
20% at —1.6 <log k < —1.1 s/km rising to about 50% on 
large scales at log k < —2 s/km. 

We are most concerned about the missing power on 
large scales. There we have SDSS measurements that we 
trust more than those from J05 on small scales, and there 
should be no problems from residual metal lines in the real 
spectra at large scales. The values that we give for the errors 
on the power on the data in Table [121 are guesses based on 
the spread between different measurement values. We note 
that the differences between the simulation and data seem 
less at large k since only a small change in k would be needed 
to align the two. However, the errors on k are very small, 
and hence we are interested in the vertical change in the 
power and not a horizontal shift in k. 

We had expected the power of the simulation to be 
smaller than in data on small scales (large k) because the 
&- values in the simulations are larger than in data. The sense 
of the differences are consistent: large r b values corres pond 
to less power at log k > -1.5 s/km (|Viel et al.ll2003l 'l. We 
also knew that we lacked large scale power when we began 
this investigation and we had hoped to understand this dif- 
ference, but we have not. 

A major conclusion of this paper is that a much larger 
box will not bring the power from the simulations up to 
that in the data since we saw in Fig. [17] that the effects of 
doubling the box size are ten times smaller than the amount 
of missing power. 

We have also shown that improving the resolution of 
a simulation by reducing the cell size makes the simulation 
more different from data at large scales. In Fig. [42] we saw 
that when we decrease the cell size, from 75 kpc to 18.75 kpc 
we decrease the power in the simulation at log k < —1 s/km, 
by 30 - 40% at the largest scales. Using these small cells, 
the power in the simulation is then about a factor of two 
(1.5 X 1.35) below that in data. Simultaneously, the power 
increases for the largest few k values, which brings the simu- 
lation closer to the data. The power from the B2 simulation 
is the lowest of all in Figs. [45] and [46] and yet it has the same 
input parameters and box size as A4 and 4 times smaller 
cells than the A series. In J05 we noted that B2 has a lower 
bcr value (corresponding to higher small scale power) but 
higher mean flux (lower power) than the A-series. Hence to 
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Figure 45. Comparison of the power of the flux in the Lya for- 
est with that for our simulations. Lines show the power of spectra 
from simulations A (black), A2 (violet), A3 (blue) and A4 (green), 
our usual color scheme. The dashed green line is for simulation B2 
from JOS. We use flux spectra that travelled in random directions 
for a distance of = 0.2 to approximate the lengths of real spec- 
tra (Appendix C), but without evolution of the IGM (Appendix 
B). Before we obtained the power, we divided all spectra by the 
mean flux in that spectrum, to match what was done with the 
data. In simulation A, the largest box, the longest mode parallel 
to a box edge has logfc > —2.91 s/km. Values of the power for 
box A are listed under Pe in Table 12. We show the power from 
[McDonald et all | j2006t) linearly extrapolated to 2: = 2 (orange x) 
and the PJ05 power (JOS §6.4) from 6 HIRES and UVES spectra 
(red +) with metal lines masked. 



better match data we should re-run B2 using a lower 7912 to 
increase the Lya absorption and this will increase the power, 
making the difference from the data less than a factor of two. 

We know that our simulations have too many low col- 
umn density lines and too few with high column densities. 
When we used cells 4 times smaller, these differences got 
worse, as did the difference in the power. We also noted 
that a four times reduction in cell size does not correct the 
large lack of lines with log Nhi> 17 cm~^, lines t h at we 
hope are excluded from the data. [Kohler fc GnedinI (|2007f ) 
showed that they obtain the correct number of such LLS 
using 2 kpc cells. We are curious whether 2 kpc cells might 
also match the entire column density distribution and per- 
haps the power. 



10.1 What Cell Size do we Need? 

We know from our analysis of the KP series of simulations 
in fjT] that we can mimic much of the effects on the Lya 
forest of doubling the size of a box by instead increasing 
the X22S parameter that increases the heat input per He II 
ionization. We must simultaneously decrease the rate of H I 
ionizations by reducing the 7912 to bring the amount of H I 
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Figure 46. As Fig. 1451 but on a log scale. 



back to the level that gives the observed mean flux value. 
Small simulation boxes are too cold compared to large ones. 

In Figs. [33 and |38] we saw that there was very little 
change in the pdf of the baryon density per cell, for typical 
densities, going from 37.5 to 18.75 kpc cells, implying that 
37.5 kpc is acceptable for our work. 

In Fig. [40] we see no convergence by even 18.75 kpc for 
the median temperature at log baryon overdensity near 1.7. 
Smaller cells are leading to higher median temperatures. Fig. 
1391 however shows that mean temperatures are converged by 
18.75 kpc, again suggesting that 37.5 kpc is small enough 
for the current work. 

In Fig. 1321 we saw that changing the cell size from 
75 kpc to 18.75 kpc has a complex effect on the power 
spectrum. The power drops by 30 - 40% on large scales 
(logfc < —0.7 s/km) but then increases by up to 30% be- 
fore falling again on the smallest-scales near the Nyquist 
frequency. However, changing the cell size from 37.5 to 
18.75 kpc has a much smaller effect, reducing the power 
by only 5% at logfc < —0.7 s/km. This implies that 37.5 kpc 
is small enough for all but the highest precision work. 

We see a similar rapid convergence for the &- values. In 
Fig. [43] we saw that the fe- value distribution changes notice- 
ably from 150 to 75 to 37.5 kpc cells, but the change going 
to 18.75 kpc is barely detectable. 

In Fig. [44] we saw that reducing the cell size from 
37.5 kpc to 18.75 kpc made the column density distribution 
increase by about 5% at log Nhi= 12 cm~^ and decrease by 
20% at 15 < log Nhi< 17 cm'^ 

In summary, a cell size of 37.5 kpc is adequate at this 
time, but slightly smaller cells (or correction factors) will be 
needed for the highest accuracy work. We see rapid conver- 
gence as the cell size drops below 37.5 kpc. We will definably 
need to apply corrections if we use cells of 75 kpc or larger. 



10.2 How Large a Box do we Need? 

We summarise the amount of convergence by listing the 
statistical parameters in the order of increasing difference 
of their A2/A ratios from unity. The mean flux (1.0008), 
amount of absorption (0.995), and the frequency of the CDM 
density (0.98) are the most converged quantities. Then fol- 
low the her (0.96), the flux pdf (1.05, 0.93) and the power of 
the flux (0.96, 1.07). The f{N) (1.10, 0.84) is less converged 
and the mode of the CDM density (0.50) shows no sign of 
convergence in our boxes. 

Some of the CDM statistics converge while others do 
not. In Fig. [2] we see convergence in the frequencies of the 
CDM densities. We also see convergence with the power of 
the CDM density in Fig. (5] however the mode of the CDM 
density distribution in Fig. [4] shows no sign of converging as 
the box size increases. We discussed how this was caused by 
the rare very high density regions in the larger boxes. These 
high density regions are not in the low density IGM and yet 
they dominate many of the CDM statistics, including the 
power and the pdf. 

To first order, the values in Table fTTI show that doubling 
the size of a box typically halves the difference of a param- 
eter value from its value in the largest box. If this trend 
were to continue unchanged, we know from the sum of the 
geometric series 1/2 -I- 1/4 -I- 1/8... that the value in a very 
large box would be approximately the value in A plus the 
difference from A2 to A. In practice we can expect the series 
to converge more quickly as the box size increases past sev- 
eral hundred Mpc t o include the peak in the matter power 
(|BagIa fc Ravll2005l ). Hence the values of the parameters in 
a very large box would be similar to the value in A plus the 
value in the column A2-A in Table [T^ 

We can estimate the change in if we had run simu- 
lation A with a resolution of 18.75 kpc instead of 75 kpc, 
using the scaling relations given in J05; the value of would 
change from 26.7 km s~^ to 25.1 km s~^. If the 6- values con- 
verge with box size as do the other statistics, which we have 
not established, then we might guess that a much larger box, 
many hundreds of Mpc in size, with 18.75 kpc cells would 
give her— 25.1 -I- 1.1 — 26.2 km s~^ that is 2.6 km s~^ larger 
than the data. 

This comparison with the measurement errors for data 
suggests that the box size is a relevant but probably not 
the dominant error for our largest box. The only exception 
is the line widths where the difference between the values 
from our two largest boxes is similar to the measurement 
error. We would like factor of several larger boxes to reduce 
this uncertainty. Since the 6- values are closely related to the 
small scale power, we would expect that larger boxes will 
also bring useful improvements in the accuracy of the small- 
scale power. 

We can most easily detect the increase in the size of a 
simulation box in data on the smallest scales, the Lya line 
widths. This is primarily because it is easier to make high 
accuracy measurement of small scale features of the Lya 
forest than of the large scale trends, such as the power of 
the flux. 
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11 PHYSICAL EXPLANATIONS 

We have deliberately treated the simulations like observa- 
tional data. We concentrated on reporting how statistics 
that describe the IGM and the Lya forest change with box 
size. We have resisted the temptation to follow additional 
side line investigations that might reveal the physical causes 
of these effects. We now give a short discussion of the pos- 
sible physical explanations for the changes that we see with 
box size. 

The larger boxes differ from smaller ones in only two 
ways; they contain more cells and they contain longer mode 
perturbations that do not fit in the smaller boxes. The extra 
modes add more power in total to the simulations, leading to 
larger velocities and more kinetic energy. We will see flows 
that are coherent on larger distances, and larger velocity dif- 
ferences on a given distance scale. Changes in the (negative) 
gravitational potential energy are more complex. Potential 
wells become deeper where long modes are positive density 
fluctuations, and shallower elsewhere, giving a cancellation 
to first order. We saw in Figure[2]that non-linear effects lead 
to rather complex changes in the CDM density per cell. We 
saw a decrease in the number of cells near and above the 
most common density and an increase in the number of lower 
density cells. 

We hope to explain, using the longer modes alone, all 
the changes that we see with box size. We see changes in 
both the elemental density, velocity and temperature fields 
and in the various Lya forest statistics that are composites 
of these fields. 

We saw in Fig. [5] that by 2: = 2 the longer modes 
have evolved to add power on all scales, but especially 
the largest ones. Th is is a Richardson-Kolmogorov cascade 
jKritsuk et all 120071 ') of energy from large to small scales, 
which is a non-linear effect. The study of iBagla fc Ravi 
(|2005l ) shows the growth in the number of halos as a result 
of this cascade, where they use a fixed box size and truncate 
the initial power to sub-box scales. We do not know if these 
results are sensitive to the finite box size, the box shape and 
the periodic boundary conditions. 



11.1 Why the gas causing the Lya forest is hotter 
in larger boxes 

The gas in the IGM that causes the Lya forest is hotter in 
the larger simulation boxes because of the enhanced heating 
by shocks. We now discuss why we believe this, and not an 
alternative explanation involving reduced adiabatic cooling. 

We have discussed how the extra longer modes evolve 
to give more power in the CDM and baryon density fiuc- 
tuations on all scales, and hence larger peculiar velocities. 
The most obvious explanation is that the increased density 
and velocity perturbations lead to faster collisions of the gas 
in given cells, and more cells that experience a collision of 
a given velocity. This thermalisation effect applies to the 
warm-hot ICM at z ~ that is not seen in H I absorp- 
tion, but it is less clear if the effect is also important for the 
IGM that produces the Lya forest z = 2. Most cells in the 
IGM have not been in any collisions hy z — 2, and hence 
we require that the heat from collisio ns spreads far beyond 
the cells that have contained shocks (|Cen fc Qstrikei|[l999l: 
iDave fc Tripp||200lh . 



We now discuss an alternative explanation, that the 
larger boxes are hotter because the gas that makes the Lya 
forest has cooled less than in the smaller boxes. The IGM is 
heated when H I, He I and He II are ionized. The tempera- 
ture drops in time due to Hubble expansion. Lower density 
regions expand faster and cool more, leading to the well 
known increases in te mperature with dens ity for the gas 
causing the Lya forest jHui fc Gnedin 19971), as i llustrated 
for two of our simulations in iTvtler et aL 1 (12004 Fig. 19) 
and Fig. 34 of J05. The longer modes in larger boxes may 
give rise to higher densities that reduce the adiabatic cool- 
ing from Hubble expansion. The gas that makes Lya forest 
lines is hotter in larger boxes because it expands less and 
cools less. 

However, this can not be the entire explanation. When 
we add the longer modes we make some densities higher and 
others lower. We expect the two to give opposing effects that 
will tend to cancel to first order. For this mechanism to give 
a net heating of the IGM we require that the long modes 
make a larger increase in the number of hotter cells than in 
the number of cooler cells. The asymmetry favouring heating 
comes from the distribution of the number of cells as a func- 
tion of density. There are more cells at lower densities, and 
hence when the long modes adjust all densities, there is a net 
flow of cells to higher d ensity. We c an se e the pdf of cells as 
a function of density in I Jena et al.l (|2005l . Fig. 34) where the 
most common baryon density is near 0.2 of the mean density, 
below the typical density that leads to lines with log Nhi = 
12.5 cm~^. This asymmetry is a version of the Malmquist 
bias, in which we see a net increas e in the number of object s 
detected in a flux limited sample (jConzalez fc Fabeilll997l ). 
The effect depends on the steepness of the pdf of the density 
per cell. 

We saw in Fig. [21] that larger boxes had a smaller frac- 
tion of their cells in the density range responsible for the Lya 
forest. The temperature depends on the relative number of 
cells with different densities. In Fig. [30] we saw that there is 
a larger decrease in the number of cells with log St — com- 
pared to cells with log Sb — 1, except for the smallest box. 
Since the lower densities correspond to lower temperatures, 
we then expect the larger boxes to have fewer cells with 
lower temperatures. The trend is in the direction needed to 
make the larger boxes have hotter gas, and wider Lya lines, 
but the effects we see on the plots we just discussed all seem 
too small to offer a credible explanation. 

In Figure [14] we saw that in larger boxes the column 
density distribution has relatively more lines near the higher 
end of the 12.5 - 14.5 cm~^ range that we use when we 
measure ba- Such lines tend to have larger fe- values. We do 
not know if the fe- value pdf changes at a given Nhi, or if we 
can we explain the larger bo- entirely in terms of the change 
in f{N). 

In summary we do not find any convincing evidence for 
the second explanation, that the Lya forest lines are wider 
in larger boxes because they come from gas that cooled less. 
Rather we prefer the first explanation, that the larger boxes 
are hotter because the velocities are larger giving more and 
stronger shocks. 
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11.2 Why the Lya forest hnes are wider in larger 
boxes 

Lyg line widths are set by three components (|Brvan et alj 
I1999I ). thermal from the gas temperature, peculiar velocities 
from hydrodynami cal motions and Hu bble from the cosmo- 
logical expansion (jSargent et al.lll980l ). We have shown in 
Figs. [20] and [24] that the gas temperature and peculiar ve- 
locities increase with box size for fixed photoionization and 
cosmological parameters. Hubble broadening depends on the 
size of the line forming regions, which in turn depends on 
the balance between the previous two gas attributes. High 
temperature leads to thermal expansion and wider lines, 
while increased peculiar velocities are in dicative of high com - 
pressions that tend to decrease the size. iBrvan et aLfljigggl ') 
determined that in the column density range relevant to 
the b-parameter distribution, 10^^"'' — 10^*'^ cm^^, thermal 
broadening increasingly dominates over Hubble broadening 
at larger column densities. Therefore we conclude the Lya 
forest lines are wider in the larger boxes because there is 
more thermal broadening, which is more important than the 
reduction in 6- values from the larger peculiar velocities. 



11.3 How might we make simulations with the 
appropriate temperatures 

We discuss several potential ways of making the IGM in our 
simulations cooler and closer to the temperature required by 
Lya forest data. 

First, we should mention that there remains a slight 
possibility that there is no problem with the IGM tempera- 
ture, and rather a mis-match in the comparison of the sim- 
ulation and data because the 6-value distributions are not 
determined using the same codes. We consider this unlikely 
in part because the power spectrum measurements tell a 
consistent story. The power spectrum is measured indepen- 
dantly, from mostly different spectra, and the higher small- 
scale power in the data compared to the simulations is con- 
sistent with the smaller ^-values in the data compared to 
simulations. 

We have seen that increasing the resolution of our sim- 
ulations does lead to a cooler IGM with smaller 6- values. 
However, we can not run the ideal simulation. A simula- 
tion with 18.75 kpc cells (like B2) in a 76.8 Mpc box (like 
A) would have 8192^ grid cells, a factor of 4^ too large for 
the supercomputers we use. An adaptive mesh refinement 
(AMR) simulation with increased resolution at the densities 
of interest is also not practical. We are interested in the fila- 
mentary structure at overdensities above 1. A simulation of 
the 76.8 Mpc box would require three levels of refinement, 
each decreasing the cell size by a factor of 2, to have an 
effective resolution equal to that of the B2 simulation. How- 
ever, the volume fraction of regions with densities above the 
cosmic mean is large, requiring a prohibitive number of re- 
finement subgrids. 

The temperature of the IGM at ionization is set in 
large part by the mean energy of the photons that cause 
the ionization. Softer spectra have steeply declining num- 
bers of photons with increasing energy above the ionization 
threshold. We could make the temperature lower by using 
an ionizing spectrum that was softer than that specified by 



Haardt and Madau, at either 1 Rydberg for H I or 4 Ry for 
He II or for both. 

Early results from simulations that include the effects 
of radiation transfer suggest that this makes the IGM hot- 
ter an d exacerbates the difference with data. iPaschos et al.l 
i2003) have carried out an approximate treatment of the he- 
lium ionization due to discrete QSO sources and have found 
out that the temperature of the IGM at the cosmic mean 
density can be on average 68% above in an optically thin 
simulation. This increases the widths of Lya lines by only 
about 1.3 km s"'^ at 2 = 2.5, because other factors (Hub- 
ble flo w, turbulence) dominate the line widths. iBolton et al.l 
(|2004h also found an increase in the IGM temperature in 
their radiative transfer calculation. 

We could also decrease the temperature by decreasing 
the He/H abundance ratio. For our Haardt and Madau UVB 
spectrum, more heat is input per baryon when the baryons 
are in *He rather than in '^H. Since He is ionized later than H, 
if we decrease the number of baryons in He we decrease the 
temperature at z = 2. However, constraints on the primor- 
dial He abundance from standard big bang nucleosynthesis, 
from CMB anisotropy constraints on the baryon density and 
from observations of the He abundance in extragalactic H II 
regions all make this suggestion a long shot. 

The IGM might be cooler at z = 2 if we ionize it ear- 
lier since the temperature of the IGM is set in part by the 
amount of expansion following the ionization. We now show 
that the effect is goes in the opposite direction, and is much 
too small to be relevant at 2 = 2. Again, we could make the 
ionization of either H I or He II or both occur earlier. We do 
this by increasing the intensity of the UVB at early times. 
Integrated over time from the earliest z we then need more 
photons to reach a given ionization at 2 = 2 because recom- 
binations are faster when the IGM is denser. However, we 
find that although cosmic expansion tends to cool down the 
IGM, it is the rapid increase of the UV background intensity 
with decreasing redshift, as computed by Haardt & Madau, 
that causes the IGM to heat up, rather than cool down. We 
demonstrate this in Fig. [47] where we show how the mean 
temperature of the IGM changes with the epoch of reion- 
ization. These results are from three simulations in 100 
Mpc boxes each with 128"^ cells. We use the Haardt & Madau 
ionizing spectrum described in 33 We use a very steep func- 
tion to increase the fiux from zero to the normal intensity 
at some high 2. When we initiate the ionizing fiux at red- 
shifts 7, 8 or 9, the mean temperature is slightly higher when 
we ionize earlier, and th e changes are insignificant at 2 < 5. 
These results differ from lMiralda-Escude fc Reed (|l994 Fig. 
2) who found a steep decrease in temperature with decreas- 
ing 2 because they used an ionizing spectrum that was con- 
tinuously decreasing in intensity over time, which is not now 
favored by data. 

Lastly, we can change the Lya line widths by changing 
the amplitude (ag) and shape (?is) of the primordial power 
spectrum. Fig 21 of J05 shows that a larger erg to match the 
large-scale power of data will also make the 6- values smaller, 
as needed to match data. 

We know from J05 that simulation A4 has approxi- 
mately the correct 6-values. The temperature of the IGM 
at 2 = 2, and at the mean baryon density, may be close 
to 12,000 K if 0^= 0.9 and we neglect radiative transfer. 
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our estimates of the power from lKim et alj (|2004al ) after we 
removed the power from metal hnes. However, the SDSS 
measurement of power should be as reliable as any current 
measurement, and the SDSS power is very similar to that 
from J05. 

The most obvious way to increase the large-scale power 
in the simulations is to increase ag above the large value 
of 0.9 that we used. We saw in Fig 21 of JOS that larger 
(jg values also give smaller 6-values, as required to bet- 
ter match data. We expect that larger ag also gives larger 
small scale powe r , beca use the lines are narrower. However, 
McD onald et al.l (|2002l . Fig. 10a) found the opposite, per- 
haps because his simulations use simpler representations of 
small scale physics. 

We shou ld also investigate changing the power spectrum 
tih or shape. iMcDonald et al.1 (|20Qa . Fig. 10a) suggests that 
decreasing the slope from n = 1.0 to n = 0.95 will increase 
the large-scale flux power in simulations by about 6%, which 
is the desired direction of change, but much less than the 
factor of 1.5 - 2 increase needed to match data. 



Figure 47. The mean temperature (in K) of the IGM as a func- 
tion of the epoch of reionization. The four panels from the upper 
left show the mean temperatures for increasing ranges of density. 
We use a different scale at the highest densities in the lower right 
because they are at higher temperatures. 

We expect a higher temperature is needed to match data if 
f7g> 0.9. 

11.4 Why do the simulations lack large scale 
power? 

We have seen that relative to Lya forest data our simulations 
lack power on all scales, especially the largest ones and their 
lines are too wide. We also noted that the simulations have 
a different f{N) distribution from the data, especially a lack 
of lines with log Nhi> 17 cm^'^. 

We discussed in i|4.2.2l whether simulations with more 
absorption from lines with log Nhi> 14 cm~^ and especially 
> 17, would have more power on large scales. We doubt that 
this will be enough to match data since we have attemeted 
to exclude lines with log Nhi> 17 cm~^ from the data, and 
the changes required in the column density distribution at 
smaller log Nhi values are not large factors. 

There are at least three other parameters that could 
increase the power in the simulations: the amplitude (erg) 
and slope (ris) of the initial power, and the temperature of 
the gas (X22g). We would have to simultaneously adjust the 
7912 to maintain the observed amount of H I absorption, 
which does match data. Our simulations use relatively high 
values for both as= 0.9 and Us = 1.0, and a best guess for 
the heating by the UVB. 

The power in the simulations could be below the data 
because of errors in the data. The power of the data pre- 
sented in J05 (PJ05) include only Lya lines (no metals) with 
log Nhi< 17.2 cm~^, hence they should be directly compa- 
rable to the simulations, but they may still contain some 
metal lines. In J05 we noted that the PJ05 power spectrum 
values could be too large, because they are 30% larger than 



12 SUMMARY OF THE COMPARISON WITH 
DATA 

Our simulations differ from data in at least three ways: 
their fe-values are too large, they have too many lines with 
log Nhi< 14 cm~^ and too few with larger Nhi, and most 
conspicuously, the power spectra of their flux has too low an 
amplitude. We have have found that increasing the box size 
does not help while decreasing the cell size, or adding ra- 
diative transfer both make the critical differences larger. A 
(jg> 0.9 is one way in which these simulations might match 
data. 

While is is early to reach a conclusion, we do feel that 
there is a real difference between our simulations of the IGM 
and the data we are using. It is premature to conclude that 
we are adding too much heat, or that we need (jg> 0.9, 
because we do not yet understand why we do not match 
the power spectrum of the flux, and we need to be more 
careful when we excise high column density Lya lines and 
metals from both data and simulations. However, it is look- 
ing increasingly difficult to make simulations that match the 
&- values, mean flux and flux power of the Lya forest at z = 2 
using popular values for the astrophysical and cosmological 
parameters. 
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APPENDIX A: HOW WE MAKE FLUX 
SPECTRA 

We calculate the optical depth t{v) using equations in §4 
of J05, and the flux from F = exp{—T), where F = 1.0 
in the absence of absorption, and F = at the center of 
saturated absorption lines. The spectra from all simulations 
are smooth functions of v, with no discreteness from the cell 
size, because to determine r at each position we integrate 
over at least ±600 cells to the left and right of the absorbing 
cell. At ~ 5 km/s velocity resolution at z=2 this corresponds 
to an integration range of at least 6000 km/s. The number 
of cells involved in the integration is even larger when the 



absorption is from a high density region that makes a DLA 
line with extensive wings. 

The bulk of the spectra that we present here, unlike 
those in J05, were made at a fixed redshift. These spectra 
are frozen in time, as if made by light travelling infinitely fast 
at the chosen z. We convert from the Mpc per grid cell into 
velocity v in km s~^ of a spectrum using the H(z) for the 
chosen z, but we do not increment the z as we move down 
a sight line, and we do not change the H(z). These frozen 
spectra are pixel to pixel identical if we pass through the 
box from the top to the bottom or in the reverse direction. 

Each spectrum starts on a face of the box and spans 
the length of the simulation box to the opposite face, along 
the z-direction. We made A^^ spectra that filled one side 
of the box. As for the CDM, have explored the other two 
orthogonal directions and found no difference of interest. 

We encountered two problems with the spectrum gen- 
erator. We initially truncated the integrations of the H I 
number density to obtain the opacity at ±500 km s~^ which 
lead to sharp cutoffs in the edges of the lines when high den- 
sity cells first enter or leave an integration. In J05 we trun- 
cated at ±200 cells, which was approximately 250 km s^^ in 
the highest resolution simulation (B2) and 2000 km s~^ for 
the A series. The second problem was an error in the spec- 
trum generator which made it fail when the optical depth 
exceeded 10^, giving a near vertical spike in the spectrum. 
The first problem made the power spectra in all simulations 
turn up to higher than correct power on scales near the 
Nyquist frequency. The second problem effected 15 spectra 
from the A simulations alone, since the others did not have 
such high optical depths. We do not expect the last problem 
to have affected the results in J05 because we most likely 
did not sample high density regions with our random lines 
of sight. However, we believe that the limited integration 
range could have contaminated the spectra we reported in 
J05 and may explain some of the odd behaviour of the flux 
power that J05 discuss. 



APPENDIX B: EVOLVING SPECTRA 

In addition to the spectra that we have discussed so far, all 
frozen in time, we also made spectra that include an approx- 
imation to the evolution expected in the IGM as light trav- 
els to us. The matter power in the IGM increases with de- 
creasing z while the flux power decre ases because the mean 
amount of abso rption d rops rapidly ( Wevmann et al.|[l998l : 
Ijanknecht et al. 2006; Kirkman et al, BoOTI ). These changes 
will produce some power on their own, since they make the 
signals (matter density, flux, flux power) non-stationary, be- 
cause they change systematically with z. 

The evolving spectra are intended to mimic the cosmo- 
logical evolution of the IGM as seen in QSO spectra. The 
parameters of the simulation are read out and stored for 
some z, and we then make the evolutions using scaling laws 
applied as the rays propagate through the data dump. We 
leave the radiation intensity constant, we scale the matter 
density as (1 + z)^ and the velocities as {1 + z). The ioniza- 
tion then decreases as z increases. Since we increment the 
redshift as we move along a sight line, we obtain different 
spectra if we reverse our direction of passage through the 
box. 
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Figure Bl. The effect of cosmological evolution along tfic sight 
lines on the power of the flux. We plot the ratio of the power of 
the spectra with evolution to the same without evolution. 

Over the length of a single box, the evolution is un- 
likely to be significant. For example, crossing the A box, the 
redshift change is 0.05 and the mean flux changes by 0.004, 
with is about half of the measurement errors with current 
spectra. Moreover, we see a ten times smaller change in the 
mean flux when we average over a range of redshifts that is 
symmetric about the central redshift: the evolution of the 
mean flux with z is nearly linear over small intervals. How- 
ever, evolution is useful if we intend to traverse many data 
dumps at different redshifts, since with evolution we can 
more readily make smoothly evolving spectra. 

In Fig. IBll we show the power of the flux from evolv- 
ing spectra divided by the same for the stationary spectra 
that we have used up to now. The values for both the evolv- 
ing and non-evolving power spectra were given in Table [51 
Evolution has a dramatic effect on power spectra at high 
frequencies obtained using the FFT algorithm. The mean 
flux is different at either end of an evolving spectrum, and 
hence the spectrum itself will not join, but instead has a flux 
jump. When the box edge is in the wings of a line, this jump 
can be 5% in flux. These jumps can increase the power in 
a single spectrum at frequencies within a factor of a few of 
the Nyquist frequency by orders of magnitude, and increase 
the mean power from all sight lines by ten times. 

There are several ways to recover the expected power 
from spectra with evolution. The evolution has an extremely 
mild effect on the flux and the conversion from Mpc into 
velocity. Both change slowly with velocity, in ways that have 
almost no effect on the small scale power. The effects that 
we show in Fig. IBll are artifacts of the use of the FFT on 
data with a discontinuity. To avoid the artifact, we could 
fit and remove the long term trends in the spectra before 
using the FFT, we can use a non-FFT algorithm, or we can 
window the data reducing the signal to zero at either end 
of each spectrum. In Fig. IB2I we see that applying a Welch 



WW 



1.1 - 




D,9 - 



0.8 r 1 

-3 -2 

og k (s/km) 

Figure B2. As Fig. IBll but after applying a Welch type window 
to both the evolving and non-evolving flux spectra. 

type window before using the FFT removes nearly all the 
artifacts. 



APPENDIX C: EXTENDED SIGHT LINES IN 
RANDOM DIRECTIONS 

We made all the spectra that we discussed so far parallel to 
the edges of a box, with the length of the box edge. 

We now discuss spectra that we made that can be of 
arbitrary length, by passing through the box multiple times 
in random directions. We can begin these spectra at ran- 
dom places in the simulation box, and send them in random 
directions. They loop through the simulation cube follow- 
ing the periodic boundary conditions, so that all the fields 
that specify the simulation vary smoothly and continuously 
along the sight line. A sight line that exits a face at 30 de- 
grees to the normal will enter the opposite face travelling 
in the original direction, and in general the spectra do not 
duplicate. Since the direction is random, we interpolate the 
pixel values using a spline fit. 

We have two different ways of adding evolution along 
these sight lines. For short sight lines, say 5z = 0.1, we use 
the passive cosmological evolution that we described in fjB] 
For longer sight lines we can patch together the data dumps 
from the simulation for different redshifts. When combined 
with the passive evolution, this allows us to make spectra 
that are as long as the Lya forest in a QSO spectrum. These 
spectra do not capture the full variation expected of the 
QSO sight line, because we have only a single volume evolv- 
ing in time. However, they are often useful. Since they have 
the same length as real spectra, they respond in the same 
way to division by the mean flux in each spectrum. They 
also contain the entire line profile from a DLA with a high 
Nhi, a line that can make the fiux zero across the width of 
a small box. 
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Figure Dl. The ID power spectra of (Scom-I for sight Unes that 
fill each of the eight sub-cubes that exactly fill the A6 simulation. 
Each sub-cube has the same volume of A7, but unlike A7, the sub- 
cubes are not periodic, and they do not have the mean power or 
the mean density of the universe. The solid (blue) line extending 
farthest to the left is the power for A6. The eight dashed lines 
show the power for the sub-cubes. The short solid (red) line is 
the power for the A7 box. 



APPENDIX D: LACK OF REALISTIC 
VARIATION IN THE SIMULATIONS 

When we compare to data, we are conscious that the simu- 
lation boxes have less variation for many connected reasons. 
We can see this difference by eye since the simulated spectra 
are more uniform and lack the large variations and strong 
clumping that we see in real spectra with the same total ab- 
sorption from the low density Lya forest alone. This com- 
parison is difficult because the real spectra include strong 
Lya lines and metals that we need to mask and ignore. 

At the physical level, our simulations all have identical 
ionizing radiation, both within the volume of each box and 
from box to box. They have exactly the mean density of 
the universe, and they begin with the mean power of the 
universe. Because of the periodic boundary conditions, they 
contain only modes that fit inside the box. All the simula- 
tions shown here have the same random number seed for the 
initial phases of their power. 

In Figure iDll we see the power of 5c dm for each of the 
eight sub-cubes that exactly fill simulation A6. Each sub- 
cube has the same volume as A7, and the same number and 
length of sight lines, but unlike A7, these sub-cubes are not 
periodic, they do not have the mean density of the universe, 
and they did not begin with the mean power in the universe. 
The amplitudes of the power spectra in the sub-cubes differ 
by many orders of magnitude. This is much more variation 
that seen in Figure [6] where we selected sight lines at random 
from all eight sub-cubes. The variation in density amongst 
the sub-cubes is large and it has has a large effect on the 
power. 
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Figure D2. The variance of &CDM in sub-cubes of A6 as a func- 
tion of their mean &CDM- We show the log of the mean of the 
dark matter variance values vertically, and the mean &CDM hor- 
izontally. Each circle applies to one of the eight sub-cubes that 
together fill A6 exactly. Each sub-cube has the same volume of 
A7. A7 is shown by the open square symbol and A6 by the filled 
square symbol, both at 5cDM= 1-0. 

In Figure [D2l we show the strong correlation between the 
mean density in a sub-cube and the variance of the Scdm- 
There is a very large variation in the mean density amongst 
the sub-cubes because they are only 2.4 Mpc on a side. 



